System Design: Travel Itinerary Planner

Requirements

Functional Requirements:

Users create multi-day trip itineraries with days, activities, and logistics
Collaborative editing: multiple travelers can edit the same itinerary simultaneously
Add activities from a place database (restaurants, museums, attractions) or custom entries
Automatic travel time calculation between consecutive activities
Conflict detection: overlapping activity times flagged to users
Offline mode: view and edit itinerary without internet connectivity, sync on reconnect

Non-Functional Requirements:

Collaborative edits appear to other users within 2 seconds
Offline edits sync without data loss on reconnect
Support 50 million users; 10 million collaborative itineraries
Place database: 200 million points of interest globally
99.9% availability; travelers access itineraries in locations with poor connectivity

Scale Estimation

10 million collaborative itineraries × average 3 collaborators = 30 million user-itinerary relationships. Active editing sessions at peak: 500,000. Edit events: each collaborative itinerary receives ~20 edits/day from all collaborators = 200 million edit events/day = 2,315/second. Place search queries (finding attractions to add): 50 million/day = 579/second. Itinerary reads (viewing while traveling): 200 million/day = 2,315/second (mobile read-heavy workload during travel). Places database: 200 million records × 500 bytes average = 100 GB, fits in a single Elasticsearch cluster.

High-Level Architecture

The itinerary planner is a collaborative document editor built around CRDTs (Conflict-free Replicated Data Types) to enable offline editing without coordination. The itinerary data structure is modeled as a CRDT (specifically a sequence CRDT for ordered activity lists within days), enabling concurrent edits from multiple devices to be merged without conflicts.

The Itinerary Service maintains the canonical server-side CRDT state in Redis (for active itineraries) backed by PostgreSQL (for persistence). Client apps maintain a local CRDT replica in SQLite (mobile) or IndexedDB (web). When online, clients sync state via a WebSocket connection to the Itinerary Sync Service. When offline, edits are applied locally to the CRDT and queued for sync. On reconnect, the sync service merges the client's CRDT with the server's and broadcasts the merged state to all collaborators.

Core Components

CRDT-Based Collaborative Editing

The itinerary is structured as a hierarchical CRDT: an outer G-Set of days (days can only be added, not deleted without explicit delete markers), each day containing a sequence CRDT (RGA — Replicated Growable Array) of activities ordered by timestamp. Activity properties (title, time, notes) are Last-Writer-Wins registers. Concurrent adds by two users are resolved by CRDT merge semantics; the result includes both activities in deterministic order. Activity deletions use tombstones. The CRDT state for a typical itinerary (7 days, 5 activities/day) fits in ~20 KB — easily synced and stored.

Place Search & Integration Service

Places are indexed in Elasticsearch: (place_id, name, category, location_geom, rating, hours, photos, typical_duration). A geo + text search returns relevant places for a destination query. When a user adds a place to the itinerary, the Place Service fetches rich details (opening hours, typical visit duration, ticket prices) from the Foursquare or Google Places API and caches in Redis (TTL = 24 hours). Travel time between consecutive activities is computed from the Google Maps Distance Matrix API and cached per (origin, destination) pair.

Offline Sync Engine

Mobile apps use SQLite to store the CRDT state locally. The Sync Engine tracks a vector clock for each device: (device_id → last_applied_operation_id). On reconnect, the client sends its vector clock to the Sync Service. The server computes the delta (operations the client is missing) and the client uploads its local operations (not yet seen by the server). The CRDT merge is computed on the server, the merged state is sent back to the client, and all connected collaborators receive a broadcast of the merged state. Vector clock comparison ensures no operations are lost or applied twice.

Database Design

Itinerary CRDT state in PostgreSQL (itinerary_id, crdt_state JSONB, version, updated_at) — JSONB allows efficient partial updates via jsonb_set. Active itinerary states also cached in Redis (itinerary:{itinerary_id} → serialized CRDT bytes) for sub-millisecond sync reads. Activity details in PostgreSQL: (activity_id, itinerary_id, day, crdt_position, place_id, custom_title, start_time, duration, notes). Places in Elasticsearch (200M records across 5 shards). User-itinerary relationships and permissions in PostgreSQL: (user_id, itinerary_id, role OWNER|EDITOR|VIEWER).

API Design

POST /v1/itineraries — Creates new itinerary; returns itinerary_id and initial CRDT state
WebSocket /ws/itineraries/{itinerary_id}/sync — Persistent connection for collaborative editing; client sends operation log; server sends merged state broadcast
GET /v1/places/search?q={}&lat={}&lng={}&category={} — Searches place database for activities to add to itinerary
POST /v1/itineraries/{itinerary_id}/activities — Adds an activity (from place search or custom); triggers travel time recalculation for adjacent activities

Scaling & Bottlenecks

The WebSocket sync service is the connection bottleneck: 500,000 concurrent editing sessions × average 3 connected devices per itinerary = 1.5 million WebSocket connections. Horizontal scaling with consistent hashing (itinerary_id → server pod) ensures all collaborators for an itinerary connect to the same pod cluster, enabling direct in-memory broadcast without Redis pub/sub. Redis pub/sub is used as fallback for cross-pod broadcasts when a collaborator reconnects to a different pod.

Elasticsearch place search at 579 QPS is within a small cluster's capacity (5 shards × 500 QPS each). The bottleneck is the Google Maps Distance Matrix API calls for travel time: at 2,315 activity additions/second × 1 API call each = 2,315 API calls/second, exceeding Maps API quotas. Mitigation: batch Matrix API calls (up to 25 origins × 25 destinations per call), aggressively cache (Redis, TTL 7 days per origin-destination pair), and use a pre-computed travel time index for popular city-center locations.

Key Trade-offs

CRDT vs. OT (Operational Transformation) — both enable collaborative editing; CRDT is simpler to implement correctly for merge semantics; OT (used by Google Docs) has lower memory overhead but requires a central transform server; CRDT's decentralized merge is better suited for offline-first mobile
CRDT memory overhead — CRDTs grow over time (tombstones accumulate); periodic compaction (removing tombstones after all devices have applied deletions) is required; the compaction job runs weekly for inactive itineraries
Conflict detection (time overlaps) — detecting scheduling conflicts is a separate layer on top of CRDT; the CRDT merges all edits without blocking, and a conflict detection service flags overlaps to users as informational warnings rather than blocking errors
Third-party place API dependency — Google Places API has usage costs and rate limits; a hybrid approach (own crawled data for common destinations, Google API for long-tail) reduces dependency and cost