Read Replicas Explained: Scaling Database Reads Without Sharding
How read replicas work — replication lag, consistency trade-offs, routing strategies, and when to use replicas vs caching or sharding for read scaling.
Read Replicas
A read replica is a copy of a primary database that serves read-only queries, distributing read load across multiple servers while the primary handles all writes.
What It Really Means
Most applications are read-heavy. A typical web application performs 10-100 reads for every write. When the primary database becomes a bottleneck, the first scaling strategy is not sharding — it is adding read replicas. A single primary handles all writes and replicates changes to one or more replicas that serve read queries.
Read replicas solve a specific problem: read throughput. They do not increase write capacity (all writes still go to the primary). They do not increase storage capacity (each replica stores the full dataset). For those problems, you need database partitioning.
The fundamental trade-off is replication lag. Changes written to the primary take time to propagate to replicas. During this window, replicas serve stale data. This is a form of eventual consistency — acceptable for most read paths but problematic for others.
How It Works in Practice
Replication Methods
Synchronous replication: The primary waits for the replica to confirm the write before acknowledging it to the client. Zero replication lag, but every write pays the network round-trip penalty.
Asynchronous replication: The primary acknowledges the write immediately and replicates in the background. Lower write latency, but replicas may be milliseconds to seconds behind.
Semi-synchronous replication: The primary waits for at least one replica to acknowledge before committing. Balances durability and performance.
Read-After-Write Consistency
The most common problem with read replicas: a user writes data and immediately reads it back, but the read hits a stale replica.
Solutions:
- Route read-after-write to primary: After a write, read from primary for a short window (e.g., 5 seconds)
- Session stickiness: Route all reads for a user session to the same replica
- Version tracking: Include a version token in the write response; reject reads from replicas behind that version
Implementation
PostgreSQL streaming replication setup:
Application-level read/write routing:
Trade-offs
Benefits:
- Scale read throughput linearly by adding replicas
- Improve read latency by placing replicas closer to users (geo-replicas)
- Offload analytics queries from the primary
- Provide high availability (promote replica to primary on failure)
Costs:
- Replication lag introduces eventual consistency
- Each replica consumes the same storage as the primary
- Write capacity does not increase
- Operational complexity of managing multiple database instances
When to use read replicas:
- Read/write ratio is 10:1 or higher
- Read latency is the bottleneck, not write latency
- You need geographic read distribution
- Analytics queries are slowing down OLTP workloads
When replicas are not enough:
- Write throughput exceeds single-node capacity (need partitioning)
- Data volume exceeds single-node storage (need partitioning)
- You need strong consistency on every read (use primary only or synchronous replication)
Common Misconceptions
- "Read replicas are always eventually consistent" — Synchronous replication provides strong consistency at the cost of write latency. The consistency model depends on your replication configuration.
- "Adding more replicas always improves performance" — Each replica adds replication overhead to the primary. Beyond 5-10 replicas, the primary may become bottlenecked on replication I/O.
- "Replicas provide automatic failover" — Read replicas and failover replicas serve different purposes. A read replica may not be configured for promotion. Use a proper HA setup (e.g., Patroni for PostgreSQL).
- "Replication lag is always small" — Under heavy write load or during long-running transactions on the primary, replication lag can grow to minutes. Monitor it continuously.
How This Appears in Interviews
- "How do you scale reads for a social media platform?" — Read replicas for timeline reads, cache layer for hot content, primary for writes.
- "A user updates their profile but sees old data" — Classic replication lag problem. Explain read-after-write consistency solutions.
- "Your primary database just died. What happens?" — Explain replica promotion, potential data loss with async replication, and split-brain prevention.
- "How do you handle analytics queries slowing down production?" — Dedicated read replica for analytics with acceptable replication lag.
Related Concepts
- Database Partitioning — when read replicas are not enough, shard for write scaling
- Connection Pooling — efficiently manage connections to multiple replicas
- BASE Properties — the consistency model read replicas operate under
- Change Data Capture — an alternative to replication for syncing data downstream
- Write-Ahead Logging — the mechanism that makes streaming replication possible
- System Design Interview Guide
- Algoroq Pricing — access all concept deep-dives
GO DEEPER
Learn from senior engineers in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.