SYSTEM_DESIGN
System Design: Digital Downloads Store
System design of a digital downloads store handling secure file delivery, license management, and DRM for software, ebooks, music, and digital assets.
Requirements
Functional Requirements:
- Sellers upload digital products: software, ebooks, music, templates, courses
- Buyers purchase and download products with instant delivery
- License key generation and management (single-use, multi-device, subscription)
- Download link security: time-limited, IP-restricted, download-count-limited
- Product versioning: sellers push updates; buyers get notified and can re-download
- Refund processing with automatic license revocation
Non-Functional Requirements:
- Support 10M products with files up to 10GB each
- Download start within 2 seconds of purchase confirmation
- Peak download throughput: 100Gbps during product launches
- 99.99% availability for purchase and download paths
- Piracy mitigation: watermarking, download limits, and abuse detection
- CDN-optimized delivery with global edge coverage
Scale Estimation
10M products with an average file size of 500MB = 5PB total storage. Active products (downloaded in last 30 days): 2M products = 1PB hot storage. Daily downloads: 2M downloads/day = 23 downloads/sec. Peak during launches: 50K concurrent downloads. Download bandwidth: 50K concurrent × 10Mbps average throughput = 500Gbps peak. License validations: 5M/day for software products = 58 validations/sec. Purchase transactions: 500K/day = 5.8 TPS.
High-Level Architecture
The digital storefront separates the purchase flow from the delivery flow. The purchase flow: Product Page → Cart → Checkout → Payment Service (Stripe) → Order Confirmation → License Generation. The delivery flow: Download Request → Auth & Validation (check download token, count, IP) → Signed URL Generation → CDN-served file download.
Files are stored in S3 with intelligent tiering: hot products (downloaded in last 7 days) in S3 Standard, warm products in S3 Infrequent Access, cold products in S3 Glacier Instant Retrieval. A CloudFront CDN with signed URLs delivers files globally. The signed URL includes: expiry timestamp (30 minutes), IP restriction (optional), and a download counter token validated by a Lambda@Edge function.
The License Service generates and validates license keys. For software products, keys are generated using a deterministic algorithm (HMAC-SHA256 of product_id + user_id + salt) allowing offline validation. For subscription-based licenses, the software phones home to the License API for periodic validation.
Core Components
Secure File Delivery Pipeline
When a purchase is confirmed, the system generates a download token: a JWT containing {order_id, product_id, user_id, max_downloads: 5, expires_at: now + 30 days}. The download URL format: https://downloads.example.com/{token}. When the user clicks the link, a Lambda@Edge function at the CDN: (1) validates the JWT signature; (2) checks expiry; (3) decrements the download counter in a DynamoDB table (atomic counter); (4) if valid, generates a signed S3 URL with a 30-minute expiry and redirects the client. The actual file is served directly from S3 through CloudFront — the application servers never handle file bytes.
License Key Management
The License Service supports three license types: (1) Perpetual — key generated once, valid forever, tied to a device limit (activation/deactivation API); (2) Subscription — key valid while subscription is active, validated via API call every 24 hours with a 7-day grace period for offline use; (3) Single-use — key consumed on first activation, used for serial-code style products. Keys are stored in PostgreSQL: license_keys table (key_id, order_id, product_id, user_id, license_type, max_activations, current_activations, status, created_at). Device activations are tracked in a license_activations table (activation_id, key_id, device_fingerprint, activated_at, deactivated_at).
Digital Asset Watermarking
For piracy mitigation, downloadable files can be watermarked with buyer information. PDF watermarks embed a barely visible buyer identifier on each page. Audio files use audio steganography to embed a unique buyer code. Software binaries embed the license key in a non-strippable location. Watermarking runs as an async pipeline: Purchase Confirmed → Kafka event → Watermark Worker (fetches original file from S3, applies buyer-specific watermark, stores watermarked copy in a per-buyer S3 prefix) → updates download URL to point to the watermarked copy. For large files, watermarking is pre-computed for popular products.
Database Design
PostgreSQL schema: products table (product_id UUID, seller_id, title, description, category, file_key VARCHAR — S3 key, file_size BIGINT, version VARCHAR, license_type ENUM, price, created_at). downloads table (download_id UUID, order_id FK, product_id FK, user_id, download_token_hash VARCHAR, download_count INT, max_downloads INT, token_expires_at TIMESTAMP, created_at). license_keys table (key_id UUID, order_id FK, product_id FK, user_id, key_value VARCHAR UNIQUE, license_type, max_activations INT, current_activations INT, status ENUM('active', 'revoked', 'expired'), created_at).
DynamoDB table for download counter (low-latency atomic decrements at the CDN edge): partition key = download_token_hash, attributes = remaining_count, last_download_at, ip_addresses (list). S3 bucket structure: s3://digital-store/{seller_id}/{product_id}/{version}/original/ for source files, s3://digital-store-delivery/{order_id}/{product_id}/ for watermarked copies.
API Design
POST /api/v1/products/{product_id}/purchase— Purchase a digital product; returns order_id, download_url, and license_key (if applicable)GET /api/v1/downloads/{token}— Initiate download; validates token, decrements counter, redirects to signed S3 URLPOST /api/v1/licenses/{key}/activate— Activate a license on a device; body contains device_fingerprint; returns activation_idPOST /api/v1/products/{product_id}/versions— Seller uploads a new version; body contains file and changelog; triggers buyer notifications
Scaling & Bottlenecks
Peak download throughput during product launches (e.g., a popular game release) is the primary challenge. A single product may see 50K concurrent download requests. CloudFront handles this by caching the file at 400+ edge locations globally. The first download to each edge triggers an origin fetch from S3; subsequent downloads are served from the edge cache. For launch events, the system pre-warms the CDN by pushing the file to all major edge locations 1 hour before the scheduled release.
The download counter in DynamoDB must handle 50K concurrent decrements for the same product. Since each download token is unique (per-order), there's no hot-key problem — each decrement targets a different DynamoDB partition key. However, the Lambda@Edge function must handle 50K concurrent invocations; cold starts are mitigated by provisioned concurrency (5,000 instances) with auto-scaling for the remainder.
Key Trade-offs
- Signed URLs over application-proxied downloads: CDN-direct delivery achieves 100Gbps throughput without application server bottlenecks, but requires Lambda@Edge for access control logic — adding cold start latency (mitigated by provisioned concurrency)
- HMAC-based license keys over random UUIDs: Deterministic generation enables offline validation in software products (no API call needed), but the algorithm must be kept secret — a leaked salt compromises all licenses
- Per-buyer watermarking over DRM: Less intrusive than DRM (no special software needed to open files), but watermarks can be stripped by determined pirates — the goal is to deter casual sharing, not prevent all piracy
- S3 Intelligent Tiering over single storage class: Reduces storage cost by 60% for the long tail of cold products, but retrieval from Glacier classes adds 50-200ms latency — acceptable since these products have minimal download traffic
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.