Infrastructure stories like this matter because they expose where AI economics become real. User growth looks exciting, but the harder question is where sustainable margins come from when model and serving costs are this high. In my own market analysis work, the winners were usually companies with distribution plus disciplined infrastructure spend, not just better demos.
Scaling to crazy numbers is never just servers, its the boring stuff too: support, trust, policy, and how fast you can fix breakage when millions hit the same edge case. What I worry about is success hiding the cracks. If you grow faster than your reliability and safety muscle, the public only learns after a big mess. Real moat now is not features, its uptime, tight feedback loops, and owning mistakes fast.
How is CharGPT read heavy? Interactions with ChatGPT are conversations. Unless we assume users will go back to read their old chats, it cannot be read heavy. Is there any data supporting this claim?
in comparison, Social media sites are read heavy as its general understanding that more people consume and less number of people will contribute. ChatGPT is not similar.
It is read heavy. Fetching data chunks and grouping into a reply is a typical read. it doesn’t update db frequently or in a complicated way, so it’s not write-heavy
This was a great read. I am curious, with all of those read replicas are you using cascading replication? I didn't see it in the write up.
Streaming from the writer to intermediary replicas that then handle the streaming load out to the replicas that are actually being accessed? It's supposed to reduce the streaming load from the writer further, so I'm curious if your team tried it and it just wasn't effective?
@devanshusave Like others have said Chatgpt is write-heavy! All those long responses have to be saved somewhere. Sure, initially, they can be saved in a NoSQL db to avoid clogging the single write instance. All that you have written can be served through a cache layer or an in-memory model.
Why chatgpt is read-heavy
Each prompt triggers:
- Tokenization (read model vocab)
- Forward pass through billions of parameters
- Retrieval from KV cache
- Sampling
This is compute-heavy reading of model weights, not writing data.
The model weights (hundreds of GBs in large models) are:
- Loaded into GPU memory
- Read repeatedly during inference
- Rarely modified
There is no write to the model during inference.
Writes are limited to:
- Logging conversation history
- Analytics
- Abuse monitoring
Compared to inference traffic, this write volume is small.
Infrastructure stories like this matter because they expose where AI economics become real. User growth looks exciting, but the harder question is where sustainable margins come from when model and serving costs are this high. In my own market analysis work, the winners were usually companies with distribution plus disciplined infrastructure spend, not just better demos.
I dug into those dynamics with February data on who is actually capturing value in the agent wave: https://thoughts.jock.pl/p/ai-agent-landscape-feb-2026-data Scale is impressive, but efficiency will decide who survives.
Scaling to crazy numbers is never just servers, its the boring stuff too: support, trust, policy, and how fast you can fix breakage when millions hit the same edge case. What I worry about is success hiding the cracks. If you grow faster than your reliability and safety muscle, the public only learns after a big mess. Real moat now is not features, its uptime, tight feedback loops, and owning mistakes fast.
Same with the other comment. I don't understand how chatgpt is read heavy.
How is CharGPT read heavy? Interactions with ChatGPT are conversations. Unless we assume users will go back to read their old chats, it cannot be read heavy. Is there any data supporting this claim?
in comparison, Social media sites are read heavy as its general understanding that more people consume and less number of people will contribute. ChatGPT is not similar.
Curios to know the reasoning.
It is read heavy. Fetching data chunks and grouping into a reply is a typical read. it doesn’t update db frequently or in a complicated way, so it’s not write-heavy
This was a great read. I am curious, with all of those read replicas are you using cascading replication? I didn't see it in the write up.
Streaming from the writer to intermediary replicas that then handle the streaming load out to the replicas that are actually being accessed? It's supposed to reduce the streaming load from the writer further, so I'm curious if your team tried it and it just wasn't effective?
@devanshusave Like others have said Chatgpt is write-heavy! All those long responses have to be saved somewhere. Sure, initially, they can be saved in a NoSQL db to avoid clogging the single write instance. All that you have written can be served through a cache layer or an in-memory model.