System Design: Seat Selection & Reservation

Requirements

Functional Requirements:

Display interactive seat map showing available, occupied, and held seats in real time
Users can select, hold, and confirm seat reservations within a booking flow
Seat hold expires automatically after 10 minutes if not confirmed
Support bulk selection (multiple seats for group travel)
Airlines define seat classes with different pricing (Basic Economy, Economy, Business)
Adjacent seat suggestions for groups and accessibility seat restrictions enforcement

Non-Functional Requirements:

Seat map reflects current availability within 5 seconds of any change
Concurrent seat selection handled without double-booking (strong consistency)
Support 100,000 simultaneous seat selection sessions across 10,000 active flights
Seat hold creation and release must be idempotent
99.99% consistency — a double-booked seat creates operational and reputational damage

Scale Estimation

10,000 active flights × 200 seats/flight = 2 million trackable seat states. Seat map requests: 10 million/day (travelers checking seats) = 116/second average, 1,000/second peak. Seat state changes (holds, confirmations, releases): 5 million holds/day + 3 million confirmations/day + 2 million releases = 10 million events/day = 116/second average. Each event requires a consistent atomic update and a broadcast to all clients viewing that flight's seat map. The fanout challenge: for a popular flight with 1,000 users viewing the seat map simultaneously, a single seat hold generates 1,000 push updates.

High-Level Architecture

The seat selection system combines a strongly consistent reservation store with a real-time seat map broadcast layer. The two planes must be carefully separated: the consistency plane (where seats are actually allocated) must never allow double-booking; the broadcast plane (where seat map updates are pushed to clients) tolerates up to 5-second staleness.

The Seat Inventory Service is the consistency authority. It stores seat states in Redis with distributed locking: each seat is identified by (flight_id, seat_number) and its state is one of AVAILABLE, HELD (with hold_id, user_id, expires_at), or CONFIRMED (with booking_id). Atomic state transitions use Redis Lua scripts (SET if state == expected) to prevent race conditions. Hold expiry is managed by Redis key TTLs + a background cleanup job.

The Seat Map Broadcast Service subscribes to seat state changes via Redis pub/sub and pushes updates to clients viewing each flight's seat map via WebSocket. Clients render the seat map locally and apply incremental updates, avoiding full seat map refreshes on each change.

Core Components

Seat State Store

Seat states are stored in Redis hashes: HSET flight:{flight_id}:seats → {seat_num: state_json}. The state_json includes: status (AVAILABLE/HELD/CONFIRMED), hold_id, user_id, expires_at, class, price. Seat holds use a Lua script for atomicity: EVAL script KEYS 1 ARGV flight_id seat_num hold_id user_id ttl_seconds — the script checks seat is AVAILABLE, writes HELD state with TTL, and returns success/failure. This single-round-trip atomic check-and-set prevents concurrent double-holds. Hold expiry: Redis key TTL ensures held seats auto-release; a separate scheduler reconciles between Redis and PostgreSQL for audit logging.

Distributed Lock for Group Selection

Selecting multiple adjacent seats (e.g., 3B, 3C, 3D together) requires holding all three atomically. A multi-key Lua script acquires holds on all seats in a single atomic operation: if any seat is unavailable, the script returns failure without holding any of them (all-or-nothing semantics). This prevents partial group holds that leave groups split. For cross-flight or cross-class selections (rare), a Redlock (distributed lock across 3 Redis nodes) is used, with a 5-second lock timeout.

Real-Time Seat Map Updates

When a seat state changes (hold, confirm, release), the Seat State Service publishes an event to a Redis pub/sub channel: channel = flight:{flight_id}:updates, message = {seat_num, new_status, timestamp}. WebSocket servers subscribe to the channels for flights their connected clients are viewing. On receiving an update, the WebSocket server pushes a small JSON delta to relevant clients: {"seat": "3B", "status": "held"}. Clients update only the changed seat indicator, not the full map. For flights with >1,000 concurrent viewers, events are batched into 200ms windows before broadcast to prevent thundering-herd rendering.

Database Design

Seat states: Redis (primary, source of truth for active flights — last 24 hours). PostgreSQL (persistent ledger): seat_reservations table (reservation_id, flight_id, seat_num, booking_id, passenger_id, status, created_at, updated_at) with a unique constraint on (flight_id, seat_num, status=CONFIRMED). PostgreSQL is written to asynchronously on state transitions; used for check-in systems, aircraft reconfiguration queries, and post-flight reporting. The unique constraint in PostgreSQL provides an additional safety net against the rare case where Redis consistency is violated. Flight seat maps (layout definitions) in PostgreSQL: (aircraft_type, seat_num, class, row, column, features).

API Design

GET /v1/flights/{flight_id}/seatmap — Returns full seat map with current availability (from Redis); client renders interactive map
POST /v1/flights/{flight_id}/holds — User requests seat hold: accepts seat_nums array, user_id; Lua script atomically holds seats; returns hold_id with 10-minute TTL
POST /v1/holds/{hold_id}/confirm — Booking service calls this after payment to convert hold to confirmed reservation; writes to PostgreSQL
WebSocket /ws/flights/{flight_id}/seatmap — Persistent connection; server pushes incremental seat state updates in real time

Scaling & Bottlenecks

Redis is the consistency bottleneck. For 10,000 active flights, the seat state is ~2 million keys (200 seats × 10,000 flights). Redis single-threaded command processing can handle ~100,000 operations/second — well above the 116 operations/second average. Peak (coordinated release events during popular flight booking opens, e.g., new routes) can spike to 10,000 operations/second — still within a single Redis node's capacity. A Redis Cluster sharded by flight_id adds headroom for 10× growth.

The WebSocket fanout for popular flights (1,000 concurrent viewers × state change per seat selection) generates 1,000 push events per selection event. Batching (200ms window) reduces this to ~5 broadcast events per second for a flight with constant selection activity, each containing multiple seat updates. WebSocket servers are stateless (connected to the same Redis pub/sub); horizontal scaling adds capacity without rebalancing.

Key Trade-offs

Redis as source of truth vs. PostgreSQL — Redis provides the latency and atomicity required for hold/confirm operations; PostgreSQL provides durability and complex query support; the dual-write pattern adds complexity but is necessary
Hold duration (10 minutes) — too short frustrates users who need time to decide; too long blocks inventory from other buyers; 10 minutes is airline industry standard for seat selection during booking
Adjacent seat grouping algorithm — finding N adjacent available seats is a graph search problem on the cabin layout graph; for N≤4 a BFS is sufficient; larger groups may require backtracking
Consistency vs. availability during Redis failover — if the Redis primary fails, seat selection is unavailable until failover completes (~30 seconds with Redis Sentinel); this is acceptable for consistency; the alternative (serving from a stale replica) risks double-booking