SYSTEM_DESIGN
System Design: Flash Sale System
System design of a flash sale system handling millions of concurrent users competing for limited inventory, using Redis atomic operations, queue-based processing, and overselling prevention.
Requirements
Functional Requirements:
- Schedule flash sales with start time, duration, and limited inventory (e.g., 1,000 units)
- Users must be authenticated before the sale starts (pre-registration)
- At sale start, users race to claim items — first-come-first-served
- Real-time display of remaining stock count
- Purchase confirmation with payment processing
- Anti-bot protection: CAPTCHA, device fingerprinting, rate limiting
Non-Functional Requirements:
- Handle 10M concurrent users hitting the buy button simultaneously
- Claim processing latency under 100ms
- Absolute zero overselling — if 1,000 units are available, exactly 1,000 (or fewer) orders are created
- 99.99% availability during the sale window
- Graceful degradation: static pages for non-sale traffic during the event
Scale Estimation
10M concurrent users at sale start, each making 1-3 purchase attempts: 30M requests within the first 10 seconds = 3M requests/sec. Of these, only 1,000 will succeed (0.01% success rate). The remaining 99.99% must receive a 'sold out' response quickly. Page views during the countdown: 10M users × 5 page refreshes = 50M page views in the minute before the sale. Stock count polling: 10M users polling every 2 seconds = 5M requests/sec for the stock count endpoint.
High-Level Architecture
The flash sale system is designed around three key principles: (1) shed load early, (2) use Redis atomic operations for stock claiming, and (3) process orders asynchronously via queues. The architecture has three tiers.
Tier 1 — CDN and Static Layer: The sale page is served entirely from the CDN as a static HTML page with JavaScript. The countdown timer runs client-side. Stock count is served from a CDN-cached endpoint with a 1-second TTL (not real-time, but close enough for display). This tier absorbs 90% of page view traffic without hitting application servers.
Tier 2 — Claim Layer (Redis): When a user clicks 'Buy', the request goes through a rate limiter (1 request per user per second, enforced by Redis SET user:{id}:claim NX EX 1) then to the Claim Service. The Claim Service executes a Redis Lua script that atomically: (a) checks if stock > 0, (b) checks if user has already claimed (SISMEMBER claimed_users user_id), (c) decrements stock (DECR flash:{sale_id}:stock), (d) adds user to claimed set (SADD claimed_users user_id), (e) pushes a claim event to a Redis list (RPUSH claim_queue {user_id, sale_id, timestamp}). This entire operation is atomic and completes in <1ms.
Tier 3 — Order Processing Layer (Kafka): A pool of Order Workers consumes from the claim queue, processes payments, creates orders in PostgreSQL, and sends confirmation emails. This tier runs asynchronously — users see 'Processing your order' immediately after a successful claim and receive confirmation within 30 seconds.
Core Components
Redis Atomic Claim Engine
The claim engine is the heart of the flash sale system. The Lua script ensures zero overselling by making stock check + decrement + user deduplication a single atomic operation in Redis. The script: local stock = tonumber(redis.call('get', stock_key)); if stock <= 0 then return -1 end; if redis.call('sismember', claimed_key, user_id) == 1 then return -2 end; redis.call('decr', stock_key); redis.call('sadd', claimed_key, user_id); redis.call('rpush', queue_key, cjson.encode({user=user_id, ts=timestamp})); return stock - 1. Return values: -1 = sold out, -2 = already claimed, >= 0 = success with remaining stock. This script runs in <1ms and handles 100K+ claims/sec on a single Redis instance.
Queue-Based Order Processing
Successful claims are processed asynchronously via a Kafka-backed order pipeline. The claim event is moved from the Redis list to a Kafka topic flash-sale-orders by a bridge worker (running every 50ms, draining the Redis list in batches). Kafka consumers (20 partitions, 20 consumer instances) process each claim: (1) validate the claim against the authoritative Redis state; (2) create a payment intent with the payment provider (Stripe); (3) wait for payment confirmation (up to 30 seconds); (4) write the order to PostgreSQL; (5) emit order.confirmed event. If payment fails, the claim is rolled back: stock is re-incremented in Redis and the user is removed from the claimed set.
Anti-Bot Protection Layer
Bot prevention is critical for flash sale fairness. The defense stack: (1) Pre-registration with email verification — only registered users can participate; (2) CAPTCHA challenge 30 seconds before sale start — solved token is required for the claim request; (3) Device fingerprinting — each device gets a unique token; multiple claims from the same fingerprint are blocked; (4) Request rate limiting — Redis-based sliding window limiter at 1 request/sec per user; (5) IP-level rate limiting — max 50 requests/sec per IP to stop bot farms; (6) Behavioral analysis — suspiciously fast claim attempts (< 200ms after sale start, which is faster than human reaction time) are deprioritized in the queue.
Database Design
Redis data model for the flash sale: String flash:{sale_id}:stock → integer stock count (initialized to sale quantity). Set flash:{sale_id}:claimed → user IDs who have successfully claimed. List flash:{sale_id}:queue → JSON-encoded claim events pending processing. Hash flash:{sale_id}:meta → {name, start_time, end_time, original_stock, product_id, price}.
PostgreSQL stores the durable order records: flash_sales table (sale_id, product_id, total_stock, sold_count, start_time, end_time, status). flash_orders table (order_id, sale_id, user_id, quantity, amount, payment_status, created_at). The flash_orders table has a UNIQUE constraint on (sale_id, user_id) as a final defense against duplicate orders.
API Design
POST /api/v1/flash-sales/{sale_id}/claim— Attempt to claim an item; requires auth token + CAPTCHA token; returns claim_status (success/sold_out/already_claimed) and queue_positionGET /api/v1/flash-sales/{sale_id}/status— Get sale status (upcoming/active/ended) and remaining stock; served from CDN with 1s TTLGET /api/v1/flash-sales/{sale_id}/order— Check order processing status for current user; returns order details if confirmedPOST /api/v1/flash-sales/{sale_id}/register— Pre-register for a flash sale; required before claiming
Scaling & Bottlenecks
The 3M requests/sec at sale start would overwhelm any traditional backend. The key is aggressive load shedding: the CDN absorbs all page views and stock count polls. The rate limiter rejects 90% of claim requests (retries, bots). The Redis claim engine processes the remaining 300K valid claims/sec. Of those, only 1,000 succeed. The entire traffic pyramid: 3M req/sec → 300K valid claims/sec → 1,000 successful claims → 1,000 orders in the queue. Each tier reduces the load by 10x.
Payment processing is the slowest step in the order pipeline (2-5 seconds per payment). With 1,000 orders and 20 Kafka consumers, all orders are processed within 50 seconds (1000 / 20 = 50 sequential payment calls per consumer). If faster processing is needed, consumer count scales horizontally. Payment failures require careful handling: the Redis stock must be re-incremented atomically, and the freed stock is immediately available for users who received 'sold out' but are still refreshing.
Key Trade-offs
- Redis Lua atomicity over distributed transactions: Single-node Redis Lua scripts provide microsecond-level atomicity for the claim operation, but limit throughput to a single Redis instance — mitigated by the fact that even a single Redis instance handles 100K+ ops/sec
- Async order processing over synchronous checkout: Users wait 30 seconds for order confirmation instead of instant checkout, but this decouples the claim speed from payment processing latency
- CDN-cached stock count (1s stale) over real-time: Prevents 5M/sec requests from hitting the application tier; users see 'approximately 47 left' which is acceptable for the urgency-driven UX
- Aggressive rate limiting risks rejecting legitimate users: 1 req/sec limit may frustrate users on poor connections who need retries, but the alternative (no rate limiting) allows bots to dominate — fairness requires this trade-off
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.