System Design: Hotel Booking System

Requirements

Functional Requirements:

Guests search for hotels by location, dates, guest count, and preferences
System displays available room types with rates for the requested dates
Guests book rooms with instant or request-based confirmation
Cancellation and modification with policy-based refund rules
Hotels manage inventory: block rooms, update rates, close date ranges
Loyalty program: earn and redeem points per stay

Non-Functional Requirements:

Inventory availability check under 100ms for a single hotel
No double bookings — availability must be strongly consistent
Support 500,000 hotels; 5 million rooms; 2 million reservations/day
Rate and availability updates from hotels applied within 30 seconds
99.99% uptime for reservation creation path

Scale Estimation

5 million rooms × 365 nights = 1.825 billion room-night slots. At any time, ~30% are occupied = 1.5 million active stays. Daily reservations: 2 million × ~3 room-nights average = 6 million room-night bookings/day = 69/second average, 500/second peak. Search load: 100 million hotel searches/day = 1,157/second average, 10,000/second peak. Inventory updates from PMSes: 500,000 hotels × 10 updates/day = 5 million updates/day = 58/second average, 1,000/second peak during post-midnight PMS batch cycles.

High-Level Architecture

The hotel booking system is split between a hotel-facing Property Management Interface (inventory and rate management) and a guest-facing Reservation System (search, book, manage).

The Inventory Engine maintains the canonical availability and rate store. Hotels connect via a Channel Management API (direct API, or via intermediaries like SiteMinder) to push rate and availability updates. The engine applies these to its inventory model: a room-type-level calendar of (rooms_available, rate_by_plan) per night.

The Search Service queries an Elasticsearch index for hotel discovery and the Inventory Engine for availability and rates. The Reservation Service handles booking transactions with ACID guarantees on the inventory count: it uses MySQL row-level locking (SELECT FOR UPDATE) to atomically decrement available room count for the requested nights.

Core Components

Inventory Engine

Room availability is modeled as a table in MySQL: (hotel_id, room_type_id, date, rooms_available, rates JSONB). The rates JSONB column stores all rate plan prices for that day. Availability update from a hotel: UPDATE ... SET rooms_available = ? WHERE hotel_id = ? AND room_type_id = ? AND date BETWEEN ? AND ?. The engine performs these updates within a transaction to ensure atomicity across multi-night date ranges. Change events are published to Kafka for downstream indexing and analytics.

Reservation Transaction Service

Booking a room requires atomically decrementing rooms_available for every night of the stay. This is a multi-row transaction in MySQL. The locking strategy: SELECT FOR UPDATE on all relevant (hotel_id, room_type_id, date) rows, check all nights have rooms_available ≥ rooms_requested, decrement, insert reservation record — all within a single transaction. If any night has insufficient inventory, the transaction rolls back and the guest sees an "unavailable" error. This prevents overbooking at the database level without requiring distributed coordination.

Rate Plan Engine

Hotels offer multiple rate plans per room: Flexible (fully refundable), Semi-Flex (partial refund), Non-Refundable (no refund, cheapest), Early Bird (book 30+ days in advance, non-refundable). The Rate Plan Engine evaluates cancellation policies at booking time and stores the applicable policy snapshot in the reservation record — changes to a hotel's policy after booking do not affect existing reservations. Refund calculations at cancellation time use the stored snapshot, not the current policy.

Database Design

Inventory in MySQL (high-consistency, ACID): (hotel_id, room_type_id, date, rooms_available, rooms_total). Composite primary key (hotel_id, room_type_id, date) with row-level locking for booking transactions. Partitioned by date month for query performance. Reservations in MySQL: (reservation_id, hotel_id, room_type_id, guest_id, check_in, check_out, rooms_booked, total_price, policy_snapshot JSONB, status). Hotels and rooms in PostgreSQL: (hotel_id, name, location_geom, star_rating, amenities JSONB). Elasticsearch for hotel search documents (synced via Debezium CDC from PostgreSQL). Reviews in Cassandra.

API Design

GET /v1/hotels/search?location={}&checkin={}&checkout={}&rooms={}&adults={} — Returns hotels with available room types and prices for the dates
POST /v1/reservations — Creates reservation: SELECT FOR UPDATE on inventory, charge card, return reservation_id and confirmation number
GET /v1/reservations/{reservation_id} — Returns reservation details, hotel contact, and cancellation policy
DELETE /v1/reservations/{reservation_id} — Cancels reservation; applies cancellation policy; processes refund; increments rooms_available

Scaling & Bottlenecks

MySQL inventory locking is the booking throughput bottleneck. At 500 bookings/second with 3-night average stays, that's 1,500 row locks/second. MySQL InnoDB handles this with multi-version concurrency control — readers don't block writers. The critical section (SELECT FOR UPDATE through commit) is ~5ms, giving a theoretical throughput of 200 concurrent bookings per MySQL shard. With 10 shards (each holding a subset of hotels by hotel_id hash), the system handles 2,000 bookings/second — comfortably above the 500/second peak.

Search scales via Elasticsearch horizontally. Availability enrichment (adding real-time room counts to search results) is the bottleneck: for top 50 results per search, 10,000 parallel availability lookups are needed at peak. A batched Inventory Engine API (single call returning availability for 50 hotels) reduces this to 200 batch calls/second, well within MySQL read replica capacity.

Key Trade-offs

MySQL for inventory vs. distributed store — MySQL provides the ACID guarantees needed for overbooking prevention; a distributed store (DynamoDB, Cassandra) lacks row-level locking across multiple date rows without expensive coordination protocols
Snapshot cancellation policy — storing policy snapshot at booking time prevents retroactive policy changes from affecting guests; hotels dislike this (can't update policies for booked guests) but it's legally and reputationally essential
Inventory over-allocation — some hotels deliberately over-allocate (sell 105 rooms for 100 available) expecting some cancellations; the system must support configurable overbooking buffers while still preventing true sell-outs
Multi-channel inventory sync — the same hotel sells on Booking.com, Expedia, and directly; a channel manager maintains a pooled inventory count shared across channels; last-availability races are handled by the inventory engine's SELECT FOR UPDATE