7 Comments
User's avatar
ToxSec's avatar

“Reddit’s migration is a good example of how large-scale infrastructure changes do not have to be dramatic, high-risk events.”

really great takeaway for me. love seeing how the dreaded events can be handled so well. nice post thanks!

cch's avatar

The migration reminds us the "strangler pattern".

cch's avatar

Would love to see how they migrated operator from forked one to standard one. Did they deploy both for a period for time or un-deploy first ?

Keshore's avatar

Migrating with mirror maker would become too resource intensive for this scale. This is clever design

Kate Johnson's avatar

Migrating petabyte-scale data while staying live is basically performing open-heart surgery while the patient is running a marathon. I've always been skeptical of "zero-downtime" claims because the reality is usually a mess of edge cases and TTL headaches. It actually reminds me of the physical version of this—trying to do a structural overhaul on a building while people are still living in it. We’ve been looking at some of the project logs over at https://qualityrenovation.com just to see how they handle the sequencing of high-stakes onsite work without killing the "uptime" for the residents. There’s a weirdly similar logic between managing a physical job site and managing a data migration; if your staging isn’t perfect, the whole thing collapses the moment you cut over. Does Reddit have a public post-mortem on the specific consistency issues they hit during the final sync?

Aravind Karthik's avatar

This migration is incredibly simple with cluster linking if you decide to move confluent Kafka. A precondition is the Kafka is being used with a small retention period.

Rakia Ben Sassi's avatar

Here’s the thing nobody tells you when you graduate from “I deploy to a VPS” to “I’m cloud-native now”:

Kubernetes is not a more reliable version of your old server. It’s a fundamentally different relationship with reliability. And if you approach it the same way, your pods will keep dying and you’ll keep losing sleep.

Let’s talk about it.

https://rakiabensassi.substack.com/p/the-kubernetes-mortality-rate-everything?utm_campaign=post-expanded-share&utm_medium=web