Discussion about this post

User's avatar
Devanshu's avatar

Why chatgpt is read-heavy

Each prompt triggers:

- Tokenization (read model vocab)

- Forward pass through billions of parameters

- Retrieval from KV cache

- Sampling

This is compute-heavy reading of model weights, not writing data.

The model weights (hundreds of GBs in large models) are:

- Loaded into GPU memory

- Read repeatedly during inference

- Rarely modified

There is no write to the model during inference.

Writes are limited to:

- Logging conversation history

- Analytics

- Abuse monitoring

Compared to inference traffic, this write volume is small.

Pawel Jozefiak's avatar

Infrastructure stories like this matter because they expose where AI economics become real. User growth looks exciting, but the harder question is where sustainable margins come from when model and serving costs are this high. In my own market analysis work, the winners were usually companies with distribution plus disciplined infrastructure spend, not just better demos.

I dug into those dynamics with February data on who is actually capturing value in the agent wave: https://thoughts.jock.pl/p/ai-agent-landscape-feb-2026-data Scale is impressive, but efficiency will decide who survives.

6 more comments...

No posts

Ready for more?