SYSTEM_DESIGN
System Design: EV Charging Network
Design the backend for an EV charging network covering charger availability, session management, payment processing, load balancing, and smart grid integration.
Requirements
Functional Requirements:
- Drivers can find nearby available chargers and check real-time availability
- Drivers initiate, monitor, and stop charging sessions via app or RFID card
- Platform manages billing: per-kWh or per-minute pricing with session receipts
- Charger operators (CPOs) manage their stations remotely: restart, update pricing, view logs
- Smart charging: load balance across chargers at a station to stay within utility demand limits
- Reservation system: reserve a charger up to 30 minutes in advance
Non-Functional Requirements:
- Charger status updated within 10 seconds of state change (available → occupied)
- Session start command delivered to charger within 3 seconds of authorization
- Support 500,000 chargers globally, 50,000 concurrent sessions
- 99.9% uptime for session management — a network outage strands drivers
- Payment processing complies with PCI-DSS Level 1
Scale Estimation
500,000 chargers globally, each sending a heartbeat every 30 seconds = 16,667 heartbeats/second. 50,000 concurrent sessions each sending meter values (kWh consumed, current, voltage) every 30 seconds = 1,667 meter events/second. Session events (start, stop, status changes): ~100,000 transactions/day = ~1.2/second average. The heaviest load is status queries from drivers searching for chargers: 5 million app opens/day searching for nearby chargers = ~58 queries/second. During a highway road trip, a driver searches along the route = 1 query/20 miles = 5 queries for a 100-mile trip.
High-Level Architecture
The charging network platform uses the OCPP (Open Charge Point Protocol) standard for communicating with chargers. The architecture separates the charger-facing Central System (OCPP server) from the driver-facing API and the CPO management portal.
Chargers connect to the Central System via OCPP over WebSocket (OCPP 1.6J) or OCPP 2.0.1 (which uses JSON-RPC over WebSocket). The Central System maintains a persistent connection with each charger, processes heartbeats, meter values, and status notifications, and sends commands (RemoteStartTransaction, ChangeAvailability) back to chargers. Charger state is stored in Redis (current status, active session) and persisted to PostgreSQL for history.
When a driver initiates a charge via the app, the Session Service authenticates the driver, checks their payment method, calls the Central System to send a RemoteStartTransaction command to the target charger, and waits for a StartTransaction response from the charger confirming the session has begun. Session data streams through Kafka for billing, analytics, and grid management.
Core Components
OCPP Central System
Built on a horizontally scaled WebSocket server fleet. Each server handles ~5,000 charger connections (500,000 total / 100 servers). Chargers are pinned to servers using consistent hashing by charger_id to maintain session stickiness. The Central System processes OCPP messages (StatusNotification, MeterValues, BootNotification, StartTransaction, StopTransaction) and forwards events to Kafka. Commands from the application layer are routed to the appropriate Central System server via a Redis routing table (charger_id → server_id).
Session Management Service
Manages the lifecycle of charging sessions: INITIATED → ACTIVE → STOPPING → COMPLETED. On session start, it creates a session record in PostgreSQL, opens a real-time meter reading window in Redis (to track kWh consumed for live billing display), and subscribes to MeterValues events from Kafka for the session's charger. On session stop (driver request or charger-initiated), it reads final kWh from StopTransaction and calls the Billing Service. Idempotency keys prevent duplicate session creation on network retries.
Smart Charging / Load Management
Each charging station has a maximum power budget (e.g., 200 kW for a 10-port station). The Load Manager monitors active sessions at each station via Redis and enforces the power budget by sending OCPP SetChargingProfile commands to individual chargers, dynamically limiting their current output. When a new session starts at a near-capacity station, the Load Manager redistributes power budgets across all active chargers. This integrates with utility demand response programs: if the grid operator signals a peak event, the platform reduces all station outputs by 20% automatically.
Database Design
Charger registry in PostgreSQL: (charger_id, station_id, cpo_id, model, connector_types, max_power_kw, location_geom, address, network_id). Spatial index on location_geom for proximity searches. Charger real-time state in Redis hash (charger:{charger_id} → {status, connector_statuses, active_session_id, last_heartbeat}). Sessions in PostgreSQL: (session_id, charger_id, driver_id, start_time, end_time, kwh_delivered, total_cost, payment_status). Sessions table is partitioned by start_time month. Historical meter readings in TimescaleDB for energy analytics and billing verification.
API Design
- GET /v1/chargers/nearby?lat={}&lng={}&radius={} — Returns list of nearby chargers with real-time availability, connector types, pricing, and ETA driving distance
- POST /v1/sessions — Driver starts session: accepts charger_id, connector_id, payment_method_id; returns session_id and initiates OCPP RemoteStartTransaction
- DELETE /v1/sessions/{session_id} — Driver stops session; sends OCPP RemoteStopTransaction and returns final kWh and cost estimate
- GET /v1/sessions/{session_id}/live — SSE stream of real-time kWh delivered, current cost, and charger status during active session
Scaling & Bottlenecks
The OCPP Central System is the connection-count bottleneck: 500,000 persistent WebSocket connections require dedicated connection management. Each server in the fleet is optimized for connection density (Netty-based, 64 GB RAM, 8 cores). Connection state (charger_id → server mapping) in Redis is the coordination point — it must be highly available (Redis Sentinel or Cluster). Heartbeat processing at 16,667/second is easily handled across 100 servers (167/server/second).
Proximity search for nearby chargers is the user-facing query bottleneck. PostGIS with a spatial index handles ~10,000 geo queries/second on a read replica. For higher load, charger locations are pre-loaded into Redis GEO (GEORADIUS queries) to serve the hot proximity search path without hitting PostgreSQL.
Key Trade-offs
- OCPP 1.6 vs 2.0.1 — OCPP 2.0.1 adds smart charging and improved security (TLS mutual auth, message signing) but most deployed chargers still use 1.6J; the platform must support both protocols in parallel
- Reservation vs. no reservation — reservations improve driver confidence on long trips but reduce utilization (reserved-but-unused slots); a 15-minute hold with auto-release balances confidence and utilization
- Per-kWh vs. per-minute pricing — per-kWh is fairest (you pay for energy consumed) but requires certified revenue-grade meters; per-minute is simpler to implement and audit; most jurisdictions now mandate per-kWh for public chargers
- Online-mandatory sessions vs. offline fallback — requiring constant connectivity for billing ensures no fraudulent free charges but strands drivers in poor-coverage areas; OCPP supports local authorization lists for offline operation with reconciliation on reconnect
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.