System Design: Inventory Management System

Requirements

Functional Requirements:

Track stock levels across multiple warehouses and channels
Reserve inventory during checkout with automatic release on timeout
Support stock movements: receive, transfer, adjust, return
Real-time stock synchronization across sales channels (web, mobile, POS, marketplace)
Low-stock alerts and automatic reorder point triggers
Batch stock updates for warehouse receiving and cycle counts

Non-Functional Requirements:

Handle 50,000 concurrent inventory operations/sec during flash sales
Zero overselling — stock must never go negative for committed orders
Reservation hold with configurable TTL (default 10 minutes)
99.999% availability — inventory inaccuracy directly causes lost revenue or overselling
Strong consistency for stock decrements; eventual consistency for analytics views

Scale Estimation

10M SKUs across 50 warehouses = 500M location-level stock records. Each record: sku_id, warehouse_id, available_qty, reserved_qty, total_qty = ~100 bytes. Total data: 50GB — fits in memory for hot path. Write QPS: 5,000 stock operations/sec average, spiking to 50K/sec during flash sales. Read QPS: 20,000 stock lookups/sec (cart validation, product page availability). Reservation operations: 2,000 reservations/sec with 10-minute TTL = 1.2M active reservations at peak.

High-Level Architecture

The Inventory Service uses a dual-storage architecture: PostgreSQL as the authoritative source of truth with Redis as a high-performance caching and reservation layer. For read operations (stock checks), the system reads from Redis with fallback to PostgreSQL. For write operations (stock decrements on order confirmation), the system uses PostgreSQL with optimistic concurrency control (OCC) — each stock record has a version column, and updates use WHERE version = expected_version. On version conflict (concurrent update), the operation retries with the new version.

Reservations (soft holds during checkout) are managed entirely in Redis for speed. A reservation creates entries in two Redis structures: (1) a hash decrementing available stock inv:{sku}:{warehouse} and (2) a sorted set reservations:{sku}:{warehouse} with TTL-scored entries for automatic expiration. A background Reservation Sweeper process monitors expired reservations and releases stock back.

For distributed locking during critical operations (stock transfers between warehouses, bulk adjustments), the system uses Redlock — a distributed lock algorithm across 5 Redis instances. The lock key is lock:inv:{sku}:{warehouse} with a 5-second TTL to prevent deadlocks.

Core Components

Stock Reservation Engine

Reservations use Redis atomic operations to prevent race conditions. Reserving N units: a Lua script atomically checks available >= N, decrements available, increments reserved, and adds a reservation entry to the sorted set — all in a single Redis command. The Lua script: if tonumber(redis.call('hget', key, 'available')) >= qty then redis.call('hincrby', key, 'available', -qty); redis.call('hincrby', key, 'reserved', qty); redis.call('zadd', res_key, expiry_ts, res_id); return 1 else return 0 end. This guarantees atomicity without distributed locks for the common case.

Optimistic Concurrency Controller

When an order is confirmed, stock must be permanently decremented in PostgreSQL. The update uses OCC: UPDATE inventory SET available_qty = available_qty - :qty, version = version + 1 WHERE sku_id = :sku AND warehouse_id = :wh AND version = :expected_version AND available_qty >= :qty. If the WHERE clause matches 0 rows, a version conflict occurred — the service retries by re-reading the current version and re-attempting. After 3 retries, the operation fails and the order saga triggers a compensating action. This approach avoids pessimistic locks that would serialize all writes to a hot SKU.

Multi-Channel Sync Service

Inventory is sold across multiple channels (web, app, Amazon, eBay) and must be synchronized. The Sync Service maintains a channel allocation model: total available stock is partitioned across channels with configurable ratios (e.g., 70% web, 20% Amazon, 10% eBay). Each channel has an independent stock counter. When one channel sells out its allocation, a rebalancing job redistributes unsold allocations from other channels. Channel stock updates are pushed via webhooks (for marketplace integrations) and Kafka events (for internal channels) with a target propagation latency under 5 seconds.

Database Design

PostgreSQL schema: inventory table (sku_id VARCHAR, warehouse_id INT, available_qty INT, reserved_qty INT, total_qty INT, reorder_point INT, version INT, updated_at TIMESTAMP, PRIMARY KEY (sku_id, warehouse_id)). stock_movements table (movement_id UUID, sku_id, source_warehouse, dest_warehouse, quantity, movement_type ENUM('receive', 'ship', 'transfer', 'adjust', 'return'), reference_id, created_at). reservations table (reservation_id UUID, sku_id, warehouse_id, quantity, order_id, expires_at, status ENUM('active', 'committed', 'released')).

Redis data model: Hash inv:{sku}:{warehouse} → {available: 150, reserved: 30, total: 180}. Sorted Set res:{sku}:{warehouse} → members are reservation_ids scored by expiry timestamp. The Redis data is rebuilt from PostgreSQL on cold start or after failover using a snapshot-and-replay approach from the stock_movements event log.

API Design

POST /api/v1/inventory/reserve — Reserve stock; body contains [{sku_id, warehouse_id, quantity}]; returns reservation_id with TTL
POST /api/v1/inventory/commit — Commit reservation (order confirmed); body contains reservation_id; permanently decrements stock
POST /api/v1/inventory/release — Release reservation (checkout abandoned); body contains reservation_id; returns stock to available
GET /api/v1/inventory/{sku_id}/availability — Check stock across all warehouses; returns [{warehouse_id, available_qty, estimated_ship_date}]

Scaling & Bottlenecks

Hot SKU problem: during a flash sale, a single popular SKU receives 50K concurrent decrement requests. PostgreSQL row-level locking would serialize these, creating a bottleneck. The solution: (1) Reservations are handled in Redis via atomic Lua scripts — no database contention; (2) Commits are batched — the Commit Worker aggregates multiple reservation commits for the same SKU into a single PostgreSQL update (e.g., decrement by 47 instead of 47 individual updates) processed every 100ms; (3) Stock sharding — for extremely hot SKUs, available stock is split across 10 virtual shards in Redis, with reservation requests distributed round-robin. Aggregate availability is the sum across shards.

Redis failover is the biggest availability risk. The system runs Redis Sentinel with 3 replicas. On primary failure, Sentinel promotes a replica within 5 seconds. During the failover window, the system falls back to PostgreSQL for stock reads (slightly stale but available). A reconciliation job runs after failover, comparing Redis state against PostgreSQL and correcting any drift caused by lost writes during the failover.

Key Trade-offs

Redis Lua scripts over distributed locks for reservations: Atomic Lua scripts are faster and simpler than Redlock for single-key operations, but can't span multiple Redis nodes — acceptable since each SKU+warehouse maps to one key
Optimistic concurrency over pessimistic locking: OCC allows parallel reads and only conflicts on concurrent writes to the same row — ideal for the read-heavy, write-bursty inventory workload
Batched commits over individual writes: Aggregating commits reduces PostgreSQL write pressure by 10-50x during peaks, but introduces a 100ms delay between reservation commit and durable persistence
Channel allocation model over global pool: Prevents one channel from draining all stock, but requires rebalancing logic and may leave unsold inventory on underperforming channels