There are two common ways to implement the read replica pattern:
1. Embed the routing logic in the application code (explained in the last post).
2. Use database middleware.
We focus on option 2 here. The middleware provides transparent routing between the application and database servers. We can customize the routing logic based on difficult rules such as user, schema, statement, etc.
The diagram below illustrates the setup:
1. When Alice places an order on amazon, the request is sent to Order Service.
2. Order Service does not directly interact with the database. Instead, it sends database queries to the database middleware.
3. The database middleware routes writes to the primary database. Data is replicated to two replicas.
4. Alice views the order details (read). The request is sent through the middleware.
5. Alice views the recent order history (read). The request is sent through the middleware.
The database middleware acts as a proxy between the application and databases. It uses standard MySQL network protocol for communication.
Pros:
- Simplified application code. The application doesn’t need to be aware of the database topology and manage access to the database directly.
- Better compatibility. The middleware uses the MySQL network protocol. Any MySQL compatible client can connect to the middleware easily. This makes database migration easier.
Cons:
- Increased system complexity. A database middleware is a complex system. Since all database queries go through the middleware, it usually requires a high availability setup to avoid a single point of failure.
- Additional middleware layer means additional network latency. Therefore, this layer requires excellent performance.
Real-world example from Shopify: https://lnkd.in/gdazbPWC
If you enjoyed this post, you might like our system design interview books as well.
SDI-vol1: https://amzn.to/3tK0qQn
SDI-vol2: https://amzn.to/37ZisW9
Not sure if I missed something here :
1. When we say that "Data is replicated to two replicas. " are we saying sync or async way. If its async how can we guarantee that during we are not providing the stale data
After all the biggest problem is read repair latency.
Are there any middleware for postgresql? It will be nice if the middleware can do the heavy lifting of routing "read your own write" by understanding the client's DAO call i.e
1. If the client say inserts and reads what it inserted in a transaction, the middleware should know this complete transaction request and call only the primary
2. If it is just read it should call replicas (even better will be if it calls replicas that has caught up with the primary at least for that log in Primary)