This week’s system design refresher:
Evolution of the Netflix API Architecture (YouTube video)
Microservice Best Practices
Code Review Pyramid
Kubernetes Periodic Table
10 Principles for Building Resilient Payment Systems by Shopify
👋 Goodbye low test coverage and slow QA cycles (Sponsored)
Whether you need 200 test cases or 2000, QA Wolf will get your web app to 80% automated end-to-end test coverage in just 4 months. All they need is a product tour and access to an environment to get started.
QA Wolf will:
Build your tests in open-source Microsoft Playwright — you own the tests
Run your entire test suite in 3 minutes as many times as you want, on their parallel testing infrastructure, at no additional cost
Maintain the tests for you 24 hours a day
Investigate all test failures and report only human-verified bugs
Integrate into your CI pipeline
Skeptical? Think your web app is too complex? QA Wolf offers a 90-day pilot so you can try them out. Schedule a demo to get started.
Evolution of the Netflix API Architecture
The Netflix API architecture went through 4 main stages.
Monolith
Direct access
Gateway aggregation layer
Federated gateway
We explain the evolution in a 4-minute video.
9 best practices for developing microservices
When we develop microservices, we need to follow the following best practices:
Use separate data storage for each microservice
Keep code at a similar level of maturity
Separate build for each microservice
Assign each microservice with a single responsibility
Deploy into containers
Design stateless services
Adopt domain-driven design
Design micro frontend
Orchestrating microservices
Over to you - what else should be included?
The Code Review Pyramid
Over to you - Any other tips for effective code review?
Kubernetes Periodic Table
A comprehensive visual guide that demystifies the key building blocks of this powerful container orchestration platform.
This Kubernetes Periodic Table sheds light on the 120 crucial components that make up the Kubernetes ecosystem.
Whether you're a developer, system administrator, or cloud enthusiast, this handy resource will help you navigate the complex Kubernetes landscape.
Guest post by Govardhana Miriyala Kannaiah
10 principles for building resilient payment systems (by Shopify)
Shopify has some precious tips for building resilient payment systems.
Lower the timeouts, and let the service fail early
The default timeout is 60 seconds. Based on Shopify’s experiences, read timeout of 5 seconds and write timeout of 1 second are decent setups.Install circuit breaks
Shopify developed Semian to protect Net::HTTP, MySQL, Redis, and gRPC services with a circuit breaker in Ruby.Capacity management
If we have 50 requests arrive in our queue and it takes an average of 100 milliseconds to process a request, our throughput is 500 requests per second.Add monitoring and alerting
Google’s site reliability engineering (SRE) book lists four golden signals a user-facing system should be monitored for: latency, traffic, errors, and saturation.Implement structured logging
We store logs in a centralized place and make them easily searchable.Use idempotency keys
Use the Universally Unique Lexicographically Sortable Identifier (ULID) for these idempotency keys instead of a random version 4 UUID.Be consistent with reconciliation
Store the reconciliation breaks with Shopify’s financial partners in the database.Incorporate load testing
Shopify regularly simulates the large volume flash sales to get the benchmark results.Get on top of incident management
Each incident channel has 3 roles: Incident Manager on Call (IMOC), Support Response Manager (SRM), and service owners.Organize incident retrospectives
For each incident, 3 questions are asked at Shopify: What exactly happened? What incorrect assumptions did we hold about our systems? What we can do to prevent this from happening?
Reference:
shopify.engineering/building-resilient-payment-systems
When it comes to "9 best practices for developing microservices", I'd add:
0. Do I really need microservices to solve my problem and can I start with a modular monolith instead?
About the "9 best practices for developing microservices" and the Single Responsibility principle
Some times you have to do it a packaging level instead of a microsservice level. Why? Communication between services take time, at least some milliseconds, so it could increase your latency