MVCC Explained: Multi-Version Concurrency Control in Databases

How MVCC lets databases handle concurrent reads and writes without locking — version chains, snapshot isolation, vacuum, and performance implications.

mvccconcurrency-controldatabasesisolationpostgresql

Multi-Version Concurrency Control (MVCC)

MVCC is a concurrency control method where the database maintains multiple versions of each row, allowing readers to see a consistent snapshot without blocking writers, and writers to proceed without blocking readers.

What It Really Means

Without MVCC, databases face a fundamental conflict: readers and writers need access to the same data simultaneously. The simplest solution is locking — a writer locks the row, blocking all readers until the write completes. This works but destroys performance under concurrent workloads.

MVCC eliminates this conflict by keeping old versions of rows alongside new ones. When a transaction updates a row, the database does not overwrite the old version. Instead, it creates a new version. Concurrent readers continue seeing the old version (the one that was committed when their transaction started), while the writer works on the new version. Readers never block writers. Writers never block readers.

PostgreSQL, MySQL InnoDB, Oracle, and SQL Server all implement MVCC, though each does it differently. PostgreSQL stores old versions in the same table (requiring VACUUM to clean up). MySQL InnoDB stores old versions in a separate undo log. Understanding these differences matters for performance tuning.

How It Works in Practice

PostgreSQL MVCC Example

Version Visibility Rules (PostgreSQL)

Each tuple has two hidden columns:

  • xmin: The transaction ID that created this version
  • xmax: The transaction ID that deleted/replaced this version (infinity if current)

A tuple is visible to transaction T if:

  1. xmin is committed AND xmin was committed before T's snapshot
  2. xmax is not set, OR xmax is not committed, OR xmax committed after T's snapshot

Snapshot Isolation

When a transaction begins (or when each statement executes, depending on isolation level), the database takes a snapshot — a record of which transactions are committed at that moment. The transaction only sees data from committed transactions in its snapshot.

Implementation

Understanding MVCC in PostgreSQL queries:

sql

VACUUM — cleaning up old versions:

sql

MySQL InnoDB vs PostgreSQL MVCC

AspectPostgreSQLMySQL InnoDB
Old versions storedIn-place (heap tuple)Undo log (separate)
Cleanup mechanismVACUUM (external)Purge thread (automatic)
Bloat riskYes (if VACUUM falls behind)Lower (undo log is separate)
Read performanceMay read dead tuplesFollows undo chain
Index handlingHOT updates avoid index bloatIndex points to latest version

Trade-offs

Benefits:

  • Readers never block writers (high concurrency)
  • Writers never block readers (consistent reads)
  • Snapshot isolation prevents most concurrency anomalies
  • No read locks needed (lower overhead than lock-based systems)

Costs:

  • Storage overhead from multiple versions (table bloat in PostgreSQL)
  • VACUUM overhead in PostgreSQL (CPU, I/O, potential table locks)
  • Long-running transactions prevent old versions from being cleaned up
  • Write conflicts still require resolution (serialization failures)

Critical operational concern — VACUUM: In PostgreSQL, the single most common production issue with MVCC is VACUUM falling behind. If a long-running transaction holds a snapshot, VACUUM cannot clean up any versions created after that snapshot. The table grows unbounded (bloat), queries slow down, and eventually disk fills up.

Monitor these metrics:

  • Dead tuple ratio per table
  • Oldest running transaction age
  • Autovacuum run frequency and duration

Common Misconceptions

  • "MVCC eliminates all concurrency problems" — MVCC prevents dirty reads and non-repeatable reads, but write-write conflicts (two transactions updating the same row) still require conflict resolution. Under Serializable isolation, one transaction aborts.
  • "VACUUM is optional" — In PostgreSQL, VACUUM is essential. Without it, table bloat grows unbounded, query performance degrades, and you risk transaction ID wraparound (a catastrophic failure mode).
  • "Read-only queries have zero overhead" — Read-only queries in MVCC must check visibility rules for every tuple. In heavily updated tables with many dead tuples, this overhead is significant.
  • "MVCC means no locking at all" — MVCC eliminates read locks but writes still acquire row-level locks to prevent concurrent modifications to the same row.
  • "All MVCC implementations are the same" — PostgreSQL and MySQL InnoDB implement MVCC very differently. PostgreSQL's in-place versioning requires VACUUM. InnoDB's undo-log approach avoids table bloat.

How This Appears in Interviews

  1. "How does PostgreSQL handle concurrent reads and writes?" — MVCC with tuple versioning, xmin/xmax, snapshot isolation, and VACUUM.
  2. "Why is your PostgreSQL table growing even though row count is stable?" — Table bloat from dead tuples. VACUUM is not keeping up, possibly due to long-running transactions.
  3. "Explain the difference between Read Committed and Repeatable Read" — Read Committed takes a new snapshot per statement. Repeatable Read takes one snapshot at transaction start.
  4. "Compare optimistic and pessimistic concurrency control" — MVCC is an optimistic approach (proceed without locks, detect conflicts at commit). Compare with pessimistic locking.

Related Concepts

GO DEEPER

Learn from senior engineers in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.