SYSTEM_DESIGN

System Design: Last-Mile Delivery System

Design a last-mile delivery system for e-commerce and grocery platforms — covering dynamic dispatch, driver tracking, proof of delivery, and failed delivery handling.

15 min readUpdated Jan 15, 2025
system-designlast-mile-deliverylogisticsdispatchroutinge-commerce

Requirements

Functional Requirements:

  • Assign delivery orders to drivers and optimize delivery sequence
  • Customers track their driver in real time on a map with live ETA
  • Driver app provides turn-by-turn navigation and delivery instructions
  • Capture proof of delivery: signature, photo, or contactless drop confirmation
  • Handle failed deliveries: customer not home, access issues, damaged packages
  • Customers can update delivery instructions or reschedule in real time

Non-Functional Requirements:

  • Live driver tracking updates reach customer app within 5 seconds
  • Route re-optimization when new orders are added completes in under 3 seconds
  • Support 1 million daily deliveries across all operational markets
  • Proof-of-delivery images uploaded and confirmed within 30 seconds
  • 99.9% availability — delivery failures have direct customer satisfaction impact

Scale Estimation

1 million deliveries/day across 50,000 active drivers. Average 20 deliveries/driver/day; driving time ~8 hours. GPS updates every 10 seconds/driver = 5,000 location events/second. Customer tracking opens: ~3 tracking sessions per delivery (customer checks 3 times) = 3 million sessions/day = ~35 sessions/second average, 500/second at peak delivery windows (11 AM–2 PM, 5–8 PM). Proof of delivery photos: 1 million/day × ~500 KB average = 500 GB/day uploaded to S3.

High-Level Architecture

The last-mile system coordinates two user-facing apps (driver app and customer tracking) with a back-end orchestration layer that handles order assignment, routing, and status management.

Orders arrive from the Order Management System (OMS) throughout the day. A Wave Planning Service batches orders by geographic zone and time window, then calls the Route Optimization Engine to create optimized routes for each driver. Routes are pushed to driver apps via the Driver Service. As drivers make deliveries, the system processes scan events, location updates, and proof-of-delivery uploads, propagating status changes to customers in real time.

Customers access a lightweight tracking web app (or mobile deep link) that connects via WebSocket to the Customer Tracking Service. This service subscribes to driver location events and status changes filtered by the customer's active orders, pushing updates to the customer's session.

Core Components

Order Wave Planning

The Wave Planner runs at configurable intervals (typically every 30 minutes for same-day orders, twice daily for next-day) and groups unassigned orders into waves based on delivery zones and time commitments. It considers driver capacity, scheduled breaks, and order time windows. After grouping, it submits batches to the Route Optimizer (OR-Tools based VRP solver) and assigns resulting routes to drivers, pushing assignments to driver apps via Firebase Cloud Messaging.

Real-Time Driver Tracking

Driver apps send GPS pings every 10 seconds via a persistent connection to the Location Gateway (WebSocket over HTTP/2). Pings land in Kafka partitioned by driver_id. The Tracking Processor consumes pings, snaps them to the road network for smoother customer-facing display, computes updated ETA for the next stop using the routing engine, and publishes (driver_id, location, eta_next_stop) to a Redis pub/sub channel per order_id. The Customer Tracking Service subscribes to order-specific channels and pushes updates to connected customer WebSocket sessions.

Proof of Delivery Service

Drivers capture PoD via the app: photo (captured by camera), signature (drawn on touchscreen), or QR code scan (contactless, customer shows QR on their phone). Photos are uploaded via a multipart HTTP POST to the PoD Service, which validates EXIF metadata (timestamp, GPS coordinates must match delivery address ± 200m) and stores in S3. The S3 key and metadata are written to Cassandra. A Lambda function triggers OCR on signatures to produce a searchable text representation. Tamper-proof: the PoD hash (SHA256 of image bytes + metadata) is written to the delivery record in PostgreSQL.

Database Design

Delivery orders in PostgreSQL: (order_id, customer_id, driver_id, address, time_window, status, sequence_number, planned_eta, actual_eta, pod_url). Indexed by driver_id + status for driver app queries, customer_id + status for tracking queries. Driver location history in TimescaleDB (driver_id, timestamp, lat, lng, speed) with 7-day retention (legal discovery window). Active driver positions in Redis hash (driver:{driver_id} → {lat, lng, heading, eta_next_stop, current_stop_index}). PoD metadata in Cassandra: partition key = order_id, stores pod_type, pod_url, captured_at, coordinates, hash.

API Design

  • GET /v1/track/{order_id} — Customer tracking endpoint; upgrades to WebSocket and streams driver location + ETA updates; falls back to 30-second polling if WebSocket unavailable
  • POST /v1/deliveries/{order_id}/pod — Driver app submits proof of delivery (multipart with photo/signature + metadata); returns confirmation within 5 seconds
  • PATCH /v1/deliveries/{order_id}/instructions — Customer updates delivery instructions (leave at door, with neighbor, etc.) mid-route; triggers driver app notification
  • POST /v1/deliveries/{order_id}/failed — Driver reports failed attempt with reason code (not home, wrong address, refused); triggers customer notification and rescheduling flow

Scaling & Bottlenecks

The customer tracking WebSocket service is the connection-count bottleneck: at 500 concurrent tracking sessions/second and average 15-minute active session, there are ~450,000 concurrent WebSocket connections at peak. Handled by a horizontally scaled tracking server fleet (100 pods × 4,500 connections each) with Redis pub/sub as the message bus. Sticky load balancing (consistent hash by order_id) ensures all session holders for an order connect to the same pod cluster, reducing Redis round-trips.

Proof of delivery photo uploads (500 KB × 1 million/day = 5.8 GB/second at peak delivery windows) are the bandwidth bottleneck. Drivers upload directly to S3 via pre-signed URLs generated by the PoD Service, bypassing application servers entirely. The PoD Service only handles metadata validation and record creation.

Key Trade-offs

  • Live tracking vs. privacy — sharing driver's exact real-time position with customers raises driver privacy concerns; the platform shows only approximate position (snapped to road, smoothed) rather than raw GPS
  • Route optimization frequency — re-optimizing every new order maximizes efficiency but confuses drivers by resequencing stops; re-optimization is batched every 30 minutes and avoids changing stops the driver is already en-route to
  • Contactless PoD vs. signature — COVID accelerated contactless delivery adoption; signature PoD provides stronger legal proof but requires physical proximity; most platforms now default to photo PoD with GPS validation
  • Same-day window promises vs. operational reality — tight delivery windows (2-hour slots) increase customer satisfaction but make route optimization harder; wider windows allow better batching and lower cost-per-delivery

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.