This week's newsletter features a chapter from one of my favorite books, Explain the Cloud Like I’m 10.
I am fascinated by our guest author, Todd Hoff’s ability to distill complex topics into simpler discussions. This is one of our inspirations for why we made the ByteByteGo newsletter in the first place.
He’s been a programmer for over 30 years, having worked in Silicon Valley his entire career. He worked with various companies, including NEC, System Industries, IBM, Sun Microsystems, and Yahoo, to name a few.
He also enjoyed teaching database courses and Perl programming during his tenure at UCSC. He ran a blog called HighScalability, covering cloud space in detail.
For those interested in a deeper dive into other subjects, the e-book is available here.
Netflix seems so simple. Press play, and video magically appears. Easy, right? Not so much.
You might expect Netflix to serve video using AWS. Press play in a Netflix application and video stored in S3 would be streamed directly to your device from S3 over the internet.
An utterly sensible approach…for a much smaller service.
But that’s not how Netflix works at all. It’s far more complicated and exciting than you might imagine.
To see why let’s look at some impressive Netflix statistics for 2022.
Netflix has more than 221.64 million subscribers.
Netflix operates in more than 190 countries.
Netflix has nearly $7.87 billion in revenue per quarter.
Netflix.com generated over 765 million global visits.
Netflix is used by 62% of US households.
Netflix is used by 23% of US adults on a daily basis.
Netflix is responsible for 9.39% of global internet traffic.
Netflix plays more than 6 billion hours of video per month.
Stranger Things 4 crossed 1 billion hours viewed.
Netflix budgeted about $17 billion for creating new content from 2021 through 2023.
What have we learned?
Netflix is huge. They’re global, have a lot of members, play a lot of videos, and have a lot of money.
Another relevant factoid is that Netflix is subscription based. Members pay Netflix monthly and can cancel at any time. When you press play to chill on Netflix, it had better work. Unhappy members unsubscribe.
We’re Going Deep
Netflix is a terrific example of all the ideas we’ve talked about, which is why this chapter goes into a lot more detail than the other cloud services we’ve covered.
One big reason for diving deeper into Netflix is they make much more information available than other companies.
Netflix holds communication as a central cultural value. Netflix more than lives up to its standards.
I’d like to thank Netflix for being so open about its architecture. Over the years, Netflix has given hundreds of talks and written hundreds of articles on the inner workings of how they operate. The whole industry is better for it.
Another reason for going into so much detail on Netflix is that Netflix is just plain fascinating. Most of us have used Netflix at one time or another. Who wouldn’t love peeking behind the curtain to see what makes Netflix tick?
Netflix operates in two clouds: AWS and Open Connect
How does Netflix keep its members happy? With the cloud, of course. Actually, Netflix uses two different clouds: AWS and Open Connect.
Both clouds must work together seamlessly to deliver endless hours of customer-pleasing video.
The three parts of Netflix: client, backend, and content delivery network (CDN)
You can think of Netflix as being divided into three parts: the client, the backend, and the content delivery network (CDN).
The client is the user interface on any device used to browse and play Netflix videos. It could be an app on your iPhone, a website on your desktop computer, or even an app on your Smart TV. Netflix controls every client for each and every device.
Everything, before you hit play, happens in the backend, which runs in AWS. That includes things like preparing all new incoming videos and handling requests from all apps, websites, TVs, and other devices.
Open Connect handles everything that happens after you hit play. Open Connect is Netflix’s custom global content delivery network (CDN). Open Connect stores Netflix videos in different locations worldwide. When you press play, the video streams from Open Connect to your device and is displayed by the client. Don’t worry; we’ll talk more about what a CDN is a little later.
Interestingly, at Netflix, they don’t say hit play on a video; they say clicking start on a title. Every industry has its lingo.
By controlling all three areas—client, backend, and CDN— Netflix has achieved complete vertical integration.
Netflix controls your video viewing experience from beginning to end. That’s why it just works when you click play from anywhere in the world. You reliably get the content you want to watch when you want to watch it.
Let’s see how Netflix makes that happen.
In 2008 Netflix Started Moving to AWS
Netflix launched in 1998. At first, they rented DVDs through the US Postal Service.
But Netflix saw the future as on-demand streaming video.
In 2007 Netflix introduced its streaming video-on-demand service that allowed subscribers to stream television series and films via the Netflix website on personal computers or the Netflix software on a variety of supported platforms, including smartphones and tablets, digital media players, video game consoles, and smart TVs.
On a personal note, that streaming video-on-demand was the future might seem obvious. And it was. I worked at a couple of startups that tried to make a video-on-demand product. They failed.
Netflix succeeded. Netflix executed well, but they were late to the game, which helped them. By 2007 the internet was fast and cheap enough to support streaming video services. That was never the case before. The addition of fast, low-cost mobile bandwidth and the introduction of powerful mobile devices like smartphones and tablets has made it easier and cheaper for anyone to stream video at any time from anywhere. Timing is everything.
Netflix Began by Running their Own Data Centers
EC2 was just starting in 2007, about the same time Netflix's streaming service started. There was no way Netflix could have launched using EC2.
Netflix built two data centers located right next to each other. They experienced all the problems we talked about in earlier chapters.
Building out a datacenter is a lot of work. Ordering equipment takes a long time. Installing and getting all the equipment working takes a long time. And as soon they got everything working, they would run out of capacity, and the whole process had to start over again.
The long lead times for equipment forced Netflix to adopt what is known as a vertical scaling strategy. Netflix made extensive programs that ran on big computers. This approach is called building a monolith. One program did everything.
The problem is that it’s tough to make a reliable monolith when you’re growing fast, like Netflix. And it wasn’t.
A Service Outage Caused Netflix to Move to AWS
For three days in August 2008, Netflix could not ship DVDs because of corruption in their database. This was unacceptable. Netflix had to do something.
The experience of building data centers had taught Netflix an important lesson—they weren’t good at building data centers.
What Netflix was good at was delivering videos to its members. Netflix would rather concentrate on getting better at delivering video rather than getting better at building data centers. Building data centers was not a competitive advantage for Netflix; delivering video is.
At that time, Netflix decided to move to AWS. AWS was just getting established, so selecting AWS was a bold move.
Netflix moved to AWS because it wanted a more reliable infrastructure. Netflix wanted to remove any single point of failure from its system. AWS offered highly reliable databases, storage, and redundant data centers. Netflix wanted cloud computing, so it wouldn’t have to build big unreliable monoliths. Netflix wanted to become a global service without building its own data centers. None of these capabilities were available in its old datacenters and never would be.
A reason Netflix gave for choosing AWS was it didn’t want to do any undifferentiated heavy lifting. Undifferentiated heavy lifting is those things that have to be done but don’t provide any advantage to the core business of providing a quality video-watching experience. AWS does all the undifferentiated heavy lifting for Netflix. This lets Netflixians focus on delivering business value.
It took over eight years for Netflix to complete moving from its data centers to AWS. During that period, Netflix grew its number of streaming customers eightfold. Netflix now runs on several hundred thousand EC2 instances.
Netflix is More Reliable in AWS
It’s not like Netflix never experiences downtime on AWS, but on the whole, its service is much more reliable than it was before.
You don’t see complaints like this very often anymore:
Or this:
Keep reading with a 7-day free trial
Subscribe to ByteByteGo Newsletter to keep reading this post and get 7 days of free access to the full post archives.