SYSTEM_DESIGN
System Design: Uber Eats
Comprehensive system design of Uber Eats covering restaurant discovery, real-time order management, delivery logistics, and dynamic pricing at scale serving 100M+ monthly users.
Requirements
Functional Requirements:
- Users can browse nearby restaurants, view menus, and place orders
- Real-time order tracking from preparation through delivery
- Drivers receive delivery requests and navigate to pickup and drop-off locations
- Restaurants manage incoming orders, update menu availability, and set preparation times
- Dynamic pricing and promotions engine adjusts delivery fees based on demand
- Ratings and reviews for restaurants, menu items, and drivers
Non-Functional Requirements:
- 100M MAU, 15M orders/day at peak; sub-second restaurant search results
- Order placement to driver assignment within 30 seconds
- 99.99% availability for ordering flow; 99.9% for tracking
- Strong consistency for payments and order state; eventual consistency for ratings and search index
- Global deployment across 6,000+ cities with regional data residency
Scale Estimation
With 15M orders per day at peak, that averages to 174 orders/sec sustained and roughly 1,200 orders/sec during dinner rush. Each order triggers 20-30 state transitions (placed, confirmed, preparing, ready, picked up, en route, delivered) producing ~450M state events/day. Driver location updates stream at 4-second intervals from 5M active drivers = 1.25M location updates/sec during peak. Restaurant search requires indexing 900K+ restaurants with menus averaging 80 items each — roughly 72M menu items with real-time availability status. Payment processing handles $50B+ GMV annually requiring PCI-DSS compliant infrastructure.
High-Level Architecture
Uber Eats builds on Uber's existing transportation platform, reusing core services for mapping, driver management, and payments. The architecture follows a microservices model with an API Gateway (Envoy-based) routing requests to domain services. The ordering flow involves: Discovery Service (restaurant search powered by Elasticsearch with geo-filtering) → Cart Service (manages cart state in Redis) → Order Service (orchestrates the order lifecycle as a state machine in PostgreSQL) → Payment Service (handles authorization, capture, and refunds via Stripe/Braintree) → Dispatch Service (matches orders to nearby drivers using the same matching algorithm as Uber rides).
The real-time layer is critical. A WebSocket Gateway maintains persistent connections with all active users, drivers, and restaurant tablets. Order state changes flow through Kafka topics, and a Notification Fanout Service pushes updates to connected clients via WebSocket or APNS/FCM for backgrounded apps. The Dispatch Service runs a bipartite matching algorithm every 2 seconds, optimizing for delivery time, driver proximity, and batching potential (combining multiple pickups at the same restaurant or nearby drop-offs into a single driver trip).
Restaurant-side infrastructure includes a Tablet Management Service that pushes orders to restaurant tablets via a persistent MQTT connection (reliable on poor WiFi). Restaurants confirm orders and update prep times, which flow back through the Order Service to update ETAs shown to the customer.
Core Components
Discovery & Search Service
Restaurant discovery uses Elasticsearch with geo-distance scoring, filtering by cuisine type, price range, delivery time estimate, and current availability (open/closed, paused for high volume). The search index is updated in near-real-time via Kafka consumers watching for menu changes, restaurant status updates, and dynamic pricing adjustments. Results are personalized using a lightweight ranking model that considers the user's order history, time of day, and trending restaurants in their area. A Redis cache layer stores pre-computed restaurant cards for popular geographic cells with a 60-second TTL.
Order Orchestration Service
The Order Service implements a finite state machine with 12 states (CREATED, PAYMENT_AUTHORIZED, SENT_TO_RESTAURANT, RESTAURANT_CONFIRMED, PREPARING, READY_FOR_PICKUP, DRIVER_ASSIGNED, DRIVER_AT_RESTAURANT, PICKED_UP, EN_ROUTE, ARRIVED, DELIVERED) and guarded transitions. Each transition emits a Kafka event consumed by downstream services (ETA calculation, notifications, analytics). The state is persisted in PostgreSQL with optimistic locking to prevent concurrent state corruption. Idempotency keys on all transitions ensure safe retries. A saga pattern coordinates the distributed transaction across Payment, Restaurant, and Dispatch services with compensating actions for cancellations.
Dispatch & Matching Service
The Dispatch Service solves a constrained optimization problem every dispatch cycle (2 seconds): given N pending orders and M available drivers, find the assignment that minimizes total delivery time while respecting constraints (max 2 active deliveries per driver, driver preferences, vehicle type requirements). This is modeled as a minimum-cost bipartite matching problem solved using the Hungarian algorithm with heuristic pruning for scalability. Driver locations are indexed in a geospatial data structure (S2 cells at level 16, roughly 150m resolution) stored in Redis for O(1) proximity queries.
Database Design
The Order table in PostgreSQL is sharded by order_id (Snowflake ID encoding timestamp + region + sequence). Columns include order_id, user_id, restaurant_id, driver_id, status, items (JSONB), subtotal, delivery_fee, tip, total, created_at, and updated_at. An Order Events table stores the full state machine history for audit and debugging. Restaurant data lives in a separate PostgreSQL cluster sharded by restaurant_id, with a Menu Items table containing item_id, restaurant_id, name, description, price, category, availability_status, and dietary_tags.
Driver location data uses a time-series approach: current locations are stored in Redis (geospatial index) for real-time queries, while historical breadcrumbs are written to a ClickHouse cluster for ETA model training and route analysis. Payment records are stored in a PCI-compliant isolated PostgreSQL cluster with field-level encryption for card data and tokenization via a vault service.
API Design
GET /api/v1/restaurants?lat={lat}&lng={lng}&cuisine={type}&sort=delivery_time&limit=20— Discover nearby restaurants with filtering and sortingPOST /api/v1/orders— Place an order; body contains restaurant_id, items array, delivery_address, payment_method_id; returns order_id and initial ETAGET /api/v1/orders/{order_id}/tracking— SSE endpoint streaming real-time order status, driver location, and ETA updatesPOST /api/v1/restaurants/{restaurant_id}/orders/{order_id}/confirm— Restaurant confirms order and sets prep_time_minutes
Scaling & Bottlenecks
The dinner rush creates a 7x traffic spike over baseline within a 2-hour window. The Dispatch Service is the primary bottleneck during peak — solving the matching problem across thousands of orders and drivers requires careful partitioning by geographic region (each city is an independent dispatch domain). Auto-scaling of dispatch workers is triggered by queue depth on the pending-orders Kafka topic. The WebSocket Gateway fleet scales horizontally with sticky sessions pinned by user_id; a connection migration protocol handles graceful rebalancing during deployments.
Database hot spots occur on popular restaurants during peak hours — a single restaurant might receive 200 orders in 10 minutes, all writing to the same restaurant shard. This is mitigated by buffering order confirmations in Redis and batch-flushing to PostgreSQL every 500ms. The Elasticsearch cluster uses time-based index rotation (daily indices for order history, a single hot index for active restaurants) with dedicated hot/warm/cold node tiers.
Key Trade-offs
- Batched dispatch cycles (2s) over instant matching: Batching allows global optimization across all pending orders rather than greedy first-come assignment, improving average delivery time by 15% at the cost of a 2-second delay in driver assignment
- JSONB for order items over normalized tables: Storing the full item list as JSONB in the Orders table avoids expensive joins during the hot read path (order tracking) but sacrifices query flexibility for analytics — solved by streaming to ClickHouse
- Saga pattern over 2PC for distributed transactions: Sagas with compensating actions (refund on cancel) provide better availability than two-phase commit across Payment, Restaurant, and Dispatch services — the trade-off is temporary inconsistency windows
- MQTT for restaurant tablets over WebSocket: MQTT's QoS levels and smaller protocol overhead handle the unreliable WiFi in restaurant kitchens better than WebSocket, but require maintaining a separate broker infrastructure (EMQX cluster)
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.