SYSTEM_DESIGN
System Design: Ticketmaster (High-Demand Event Ticketing)
Design a Ticketmaster-scale high-demand event ticketing system handling flash sales and celebrity concert drops with millions of concurrent users. Covers queue management, idempotency, optimistic locking, and preventing double-booking.
Requirements
Functional Requirements:
- Browse events, venues, and seat maps with real-time availability
- Virtual waiting queue for high-demand sales with fair position assignment
- Reserve specific seats or best-available seats for a time-limited hold period (10 minutes)
- Complete purchase with payment processing; issue tickets as QR codes or mobile passes
- Prevent double-booking: a seat can only be sold to one buyer
- Support ticket transfer, resale, and fraud detection (scalper bot prevention)
Non-Functional Requirements:
- Handle 500k concurrent users at sale start for major events
- Seat reservation must be atomic — no two users can reserve the same seat simultaneously
- 99.99% availability; a ticket system outage during a sale is a major business failure
- Reservation holds must expire reliably after 10 minutes to return inventory to the pool
- Idempotent payment: network retries must never result in double charges
Scale Estimation
For a Taylor Swift stadium sale: 70,000 seats, 2M users attempting to buy simultaneously. Queue must be processed at a rate tickets become available (some seats blocked for other allocations). Average 4 tickets per order = 17,500 orders for a 70k-seat venue. Payment processing: 17,500 successful transactions over ~20 minutes. Read traffic: event detail page at 500k concurrent = ~50k requests/second to the availability API.
High-Level Architecture
The architecture centers on three systems: a Virtual Queue, a Seat Reservation Engine, and a Payment Service. When a high-demand sale opens, users are placed in a virtual queue rather than hitting the inventory directly. The queue assigns each user a random position (preventing first-come advantages based on network speed), and users are admitted in batches as the system can handle checkout sessions. This transforms a thundering herd into a manageable stream.
Once admitted from the queue, users interact with the Seat Reservation Engine. Seats have states: available, held, and sold. A user selecting a seat triggers an optimistic lock claim: the system performs an atomic compare-and-swap — if the seat is still available, it transitions to held and records the hold expiry. If the seat was taken between the user viewing availability and attempting to claim it, the user receives a conflict response and must select again. Holds expire via a scheduled TTL mechanism, returning seats to available.
Payment is handled by a dedicated Payment Service that wraps the payment processor (Stripe/Adyen). The service is idempotent — each checkout session has a unique order_id used as an idempotency key. If the payment request is retried (network timeout, client retry), the same order_id returns the original result without charging again. On successful payment, the seat transitions from held to sold atomically.
Core Components
Virtual Queue Service
Manages admission for high-demand sales. At sale start, all incoming users are assigned a random queue position. The queue service drains at a configured rate (e.g., 1,000 users/minute admitted), issuing time-limited admission tokens to each batch. Users without an admission token are redirected to a queue status page showing estimated wait time. Bot detection runs at queue entry: behavioral analysis, browser fingerprinting, and CAPTCHA challenges throttle scalper bots. The queue itself is backed by Redis Sorted Sets, scored by a random value computed at entry time.
Seat Reservation & Inventory Engine
The core of the system. Seat state is stored in Redis using a per-seat key with state and hold metadata. Reservation is a Lua script executed atomically in Redis: if seat.status == 'available' then set seat.status = 'held', seat.user_id = X, seat.expires_at = now + 600; return OK else return CONFLICT end. This atomic script prevents race conditions without distributed locks. Seat expiry is managed by a Redis keyspace expiry notification or a dedicated expiry worker scanning hold records. Confirmed sales are written durably to PostgreSQL; Redis is the performance layer.
Payment Service (Idempotent)
Each order has a UUID generated at session start. The Payment Service uses this UUID as the idempotency key with the payment provider. Before calling the payment API, it checks an orders table for an existing record with this order_id — if found and in terminal state (paid/failed), it returns the cached result without calling the external API. If the record is in processing state (a prior call is in-flight), it returns a retry-after response. This three-state idempotency check handles all network failure and retry scenarios without double charges.
Database Design
Seats are the central entity. PostgreSQL stores the durable seat catalog: seat_id, event_id, section, row, number, tier, price. Redis stores the live mutable state: seat:{seat_id} → {status, held_by, held_until, order_id}. On successful sale, a sold_seats record is written to Postgres and the Redis key is updated to sold. If Redis fails, the system falls back to Postgres with row-level locks — slower but correct.
Orders table: order_id UUID PK, user_id, event_id, seat_ids[], status ENUM(pending, processing, paid, failed, refunded), idempotency_key, created_at, paid_at. The idempotency_key has a unique index. An order_items table links orders to individual seats. Ticket QR codes are generated post-payment by a Ticket Service and stored in S3, with the S3 URL in the tickets table.
API Design
POST /api/v1/queue/join?event_id={id} — joins the virtual queue; returns {queue_position, estimated_wait_minutes, queue_token} (queue_token valid for 30 minutes once position is reached).
POST /api/v1/seats/reserve — body: {seat_ids[], event_id, queue_token}; atomically reserves seats; returns {order_id, held_until} or {error: "SEAT_TAKEN", available_alternatives[]}.
POST /api/v1/orders/{orderId}/pay — initiates payment; idempotent on order_id; returns {status, ticket_ids} on success.
DELETE /api/v1/orders/{orderId}/hold — explicitly releases a hold before expiry.
Scaling & Bottlenecks
The seat reservation engine is the most critical bottleneck. With 500k users competing for 70k seats, the Redis cluster for seat state must sustain ~200k operations/second at sale start. Redis Cluster with 16 shards (key by seat_id) distributes this load. The Lua script approach avoids multi-key transactions that would violate Redis Cluster sharding. Each shard handles ~12.5k ops/second — well within Redis's single-node capacity of 100k+ ops/second.
Hold expiry is the second reliability concern. If the expiry worker crashes, holds never release and inventory is stranded. Two mechanisms defend against this: Redis key TTL automatically expires the key (triggering a Lua cleanup script), plus a reconciliation job in PostgreSQL that scans for holds older than 12 minutes and forcibly returns them to available. Belt-and-suspenders expiry ensures inventory is never permanently locked by a crashed client.
Key Trade-offs
- Virtual queue vs. direct sale: A virtual queue prevents server overload and bots gaining advantage from faster connections, but adds friction for legitimate users; skipping the queue is simpler to implement but creates a thundering herd that can take down the system.
- Optimistic locking vs. pessimistic locking: Redis atomic scripts (optimistic) scale to hundreds of thousands of operations per second; database row locks (pessimistic) are simpler but become a bottleneck at scale — a single popular seat section can serialize all reservation attempts.
- Hold duration trade-off: 10-minute holds give buyers time to complete checkout but strand inventory when users abandon sessions; shorter holds (3 minutes) return inventory faster but frustrate users with slow payment flows.
- Redis-first vs. database-first inventory: Redis-first gives sub-millisecond reservation checks but requires a carefully designed durability fallback; database-first is simpler and durable but 10-100x slower under high concurrency.
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.