How read replicas work — replication lag, consistency trade-offs, routing strategies, and when to use replicas vs caching or sharding for read scaling.

Read Replicas

A read replica is a copy of a primary database that serves read-only queries, distributing read load across multiple servers while the primary handles all writes.

What It Really Means

Most applications are read-heavy. A typical web application performs 10-100 reads for every write. When the primary database becomes a bottleneck, the first scaling strategy is not sharding — it is adding read replicas. A single primary handles all writes and replicates changes to one or more replicas that serve read queries.

Read replicas solve a specific problem: read throughput. They do not increase write capacity (all writes still go to the primary). They do not increase storage capacity (each replica stores the full dataset). For those problems, you need database partitioning.

The fundamental trade-off is replication lag. Changes written to the primary take time to propagate to replicas. During this window, replicas serve stale data. This is a form of eventual consistency — acceptable for most read paths but problematic for others.

How It Works in Practice

Replication Methods

Synchronous replication: The primary waits for the replica to confirm the write before acknowledging it to the client. Zero replication lag, but every write pays the network round-trip penalty.

Asynchronous replication: The primary acknowledges the write immediately and replicates in the background. Lower write latency, but replicas may be milliseconds to seconds behind.

Semi-synchronous replication: The primary waits for at least one replica to acknowledge before committing. Balances durability and performance.

Read-After-Write Consistency

The most common problem with read replicas: a user writes data and immediately reads it back, but the read hits a stale replica.

Solutions:

Route read-after-write to primary: After a write, read from primary for a short window (e.g., 5 seconds)
Session stickiness: Route all reads for a user session to the same replica
Version tracking: Include a version token in the write response; reject reads from replicas behind that version

Implementation

PostgreSQL streaming replication setup:

sql

Application-level read/write routing:

python

Trade-offs

Benefits:

Scale read throughput linearly by adding replicas
Improve read latency by placing replicas closer to users (geo-replicas)
Offload analytics queries from the primary
Provide high availability (promote replica to primary on failure)

Costs:

Replication lag introduces eventual consistency
Each replica consumes the same storage as the primary
Write capacity does not increase
Operational complexity of managing multiple database instances

When to use read replicas:

Read/write ratio is 10:1 or higher
Read latency is the bottleneck, not write latency
You need geographic read distribution
Analytics queries are slowing down OLTP workloads

When replicas are not enough:

Write throughput exceeds single-node capacity (need partitioning)
Data volume exceeds single-node storage (need partitioning)
You need strong consistency on every read (use primary only or synchronous replication)

Common Misconceptions

"Read replicas are always eventually consistent" — Synchronous replication provides strong consistency at the cost of write latency. The consistency model depends on your replication configuration.
"Adding more replicas always improves performance" — Each replica adds replication overhead to the primary. Beyond 5-10 replicas, the primary may become bottlenecked on replication I/O.
"Replicas provide automatic failover" — Read replicas and failover replicas serve different purposes. A read replica may not be configured for promotion. Use a proper HA setup (e.g., Patroni for PostgreSQL).
"Replication lag is always small" — Under heavy write load or during long-running transactions on the primary, replication lag can grow to minutes. Monitor it continuously.

How This Appears in Interviews

"How do you scale reads for a social media platform?" — Read replicas for timeline reads, cache layer for hot content, primary for writes.
"A user updates their profile but sees old data" — Classic replication lag problem. Explain read-after-write consistency solutions.
"Your primary database just died. What happens?" — Explain replica promotion, potential data loss with async replication, and split-brain prevention.
"How do you handle analytics queries slowing down production?" — Dedicated read replica for analytics with acceptable replication lag.

Related Concepts

Database Partitioning — when read replicas are not enough, shard for write scaling
Connection Pooling — efficiently manage connections to multiple replicas
BASE Properties — the consistency model read replicas operate under
Change Data Capture — an alternative to replication for syncing data downstream
Write-Ahead Logging — the mechanism that makes streaming replication possible
System Design Interview Guide
Algoroq Pricing — access all concept deep-dives

Read Replicas Explained: Scaling Database Reads Without Sharding

Read Replicas

What It Really Means

How It Works in Practice

Replication Methods

Read-After-Write Consistency

Implementation

Trade-offs

Common Misconceptions

How This Appears in Interviews

Related Concepts

Learn from senior engineers in our 12-week cohort

Database Replication Explained: Keeping Data in Sync Across Nodes

Consistent Reads Explained: Getting Fresh Data from Replicated Systems

CAP Theorem Explained: Consistency, Availability, and Partition Tolerance

Connection Pooling Explained: Why Opening a New Database Connection Is Expensive

Materialized Views Explained: Precomputed Query Results for Fast Reads

Change Data Capture Explained: Streaming Database Changes in Real Time