Discussion about this post

User's avatar
Scenarica's avatar

The billion predictions per second is impressive but it's actually the easier constraint to solve. Thats parallelism. The harder constraint is the 100ms budget for any single prediction, because latency is an architecture problem and you cant parallelise your way out of it.

Everything downstream, the retrieval/ranking split and the compute graph separation between GPU and CPU, is shaped by that single number. The system's architecture is really a latency budget expressed as infrastructure.

Rebellio's avatar

Loved the point about train-serve skew - it's one of those problems that sounds mundane until you realise it's quietly responsible for half the "why is our model underperforming in production" conversations that happen in ML teams worldwide.

What's interesting to me is how this infrastructure story maps onto Snap's broader business survival arc.

People forget that Snap was left for dead twice - first when Instagram cloned Stories in 2016 and stripped them of what felt like their core identity, then again when TikTok redefined what a feed could even be. Both times, the obituaries were written. Both times, Snap responded not by out-featuring the competition by out-engineering them - AR, ephemeral identity and a recommendation engine that can rival what Meta is running.

When your ranking system can predict conversion probability accurately enough to win auctions against Google and Meta inventory, you've crossed a threshold which explains why the revenue story has stabilised despite the platform being written off repeatedly.

The shift from a messaging app to a full ML-powered content platform wasn't a pivot, but a quiet infrastructure rebuild that a lot of us (myself included) missed - thank you for putting this into perspective so coherently.

1 more comment...

No posts

Ready for more?