SYSTEM_DESIGN

System Design: CDN for Video Delivery

System design of a Content Delivery Network optimized for video streaming, covering multi-tier caching, edge server architecture, origin shield, and cache invalidation for global video delivery.

18 min readUpdated Jan 15, 2025
system-designcdnvideo-deliverycachingedge-computing

Requirements

Functional Requirements:

  • Serve video segments (HLS/DASH) from edge servers closest to viewers
  • Support adaptive bitrate streaming with per-segment URL resolution
  • Cache video content with multi-tier hierarchy (edge → regional → origin shield → origin)
  • Handle cache invalidation for updated or removed content within 60 seconds
  • Provide real-time analytics on cache hit rates, bandwidth, and viewer experience metrics
  • Signed URL support for DRM-protected and access-controlled content

Non-Functional Requirements:

  • Serve 500+ Tbps aggregate bandwidth across 300+ PoPs worldwide
  • Cache hit ratio above 95% for popular content, above 80% overall
  • First-byte latency under 50ms for cached content at the edge
  • 99.99% availability per PoP; graceful failover on PoP outage
  • Support 10 billion HTTP requests per day

Scale Estimation

With 500 Tbps peak bandwidth and an average HLS segment size of 2MB (at 4 Mbps × 4 seconds), that is 31 billion segment requests per day. Edge storage: 300 PoPs × 200TB storage each = 60PB total edge capacity. Content library: assume 50PB of encoded video. With 60PB edge storage, only the top ~60% of content by popularity fits at the edge — the long tail is served from regional or origin tiers. Requests per second: 31B/86400 = 360K requests/sec globally. Per-PoP average: 1,200 req/sec, but hot PoPs (major cities) handle 50K+ req/sec.

High-Level Architecture

The CDN architecture uses a four-tier hierarchy. Tier 1 (Edge PoPs): 300+ locations globally, each with 10-100 servers. Edge servers run NGINX with a custom caching module that stores video segments on NVMe SSDs. Tier 2 (Regional Hubs): 20-30 regional data centers that aggregate cache misses from multiple edge PoPs. Tier 3 (Origin Shield): 3-5 origin shield locations that coalesce cache misses before hitting the origin, protecting the origin from thundering herd. Tier 4 (Origin): the source of truth — typically an S3-compatible object store where all encoded content lives.

Request routing uses a combination of DNS-based and Anycast routing. When a viewer's player requests a video segment, DNS resolves the CDN hostname to the nearest edge PoP using latency-based routing (measuring RTT from the client's DNS resolver to each PoP). Within the PoP, an L4 load balancer distributes requests across edge servers using consistent hashing on the segment URL — this ensures all requests for the same segment hit the same server, maximizing cache hit rates.

The Fill Pipeline runs continuously, pushing content from origin to edge PoPs based on predicted demand. A Content Popularity Predictor (gradient boosting model trained on historical view counts, time-of-day patterns, and new release schedules) generates a ranked list of content per region. The top N titles are proactively pushed to edge PoPs during off-peak hours via a background transfer service (BitTorrent-inspired protocol between PoPs for efficient distribution).

Core Components

Edge Server Architecture

Each edge server runs a custom NGINX-based HTTP server with a two-tier local cache: an in-memory LRU cache (64GB RAM) for the hottest segments (manifests, first segments of popular videos) and an NVMe SSD cache (10-50TB) for the bulk of served content. Cache lookup is O(1) via a hash table indexed by segment URL. On a cache miss, the server fetches from the regional hub (Tier 2) via a persistent HTTP/2 connection pool. Request coalescing prevents multiple concurrent fetches for the same segment — the first miss triggers a fetch, subsequent requests for the same segment are queued and served from the fetched result.

Origin Shield

The origin shield is the critical layer between the CDN and the origin storage. Without it, a cache miss at 300 edge PoPs could result in 300 concurrent requests to the origin for the same segment. The origin shield collapses these into a single request. It runs on high-memory servers (1TB RAM, 100TB SSD) at 3-5 locations. The shield uses a consistent hashing ring to partition the content keyspace across shield servers, ensuring each segment is owned by a single shield server for maximum deduplication. Cache invalidation messages flow from the origin to the shield, then cascade to regional hubs and edge PoPs.

Cache Invalidation Service

Content removal or update triggers a cache purge that must propagate to all 300+ PoPs within 60 seconds. The Invalidation Service uses a Kafka-backed event bus. When the origin publishes a purge event (keyed by URL pattern or content ID), the Invalidation Service fans out purge commands to all PoPs via persistent gRPC connections. Each edge server receives the purge command, removes the content from its local cache, and ACKs. A reconciliation job runs every 5 minutes to catch any missed purges by comparing edge cache manifests against the origin's active content list.

Database Design

The CDN does not use a traditional database for the hot path. Edge server caches use a custom key-value store on NVMe with a memory-mapped index file. The key is a hash of the segment URL; the value is the file offset and length on the SSD. This index fits entirely in memory (64 bytes per entry × 50M segments = 3.2GB). Metadata about content (content_id, rendition_id, segment_number, codec, resolution, file_size, ttl) is stored in a PostgreSQL database at the origin, queried only during cache fill operations.

Analytics data (request logs, cache hit/miss events, bandwidth counters, viewer experience metrics) is collected by each edge server and streamed to a centralized Kafka cluster via a lightweight log shipper (Fluent Bit). From Kafka, data flows into ClickHouse for real-time dashboards and S3 + Athena for batch analytics. Per-PoP aggregated metrics (bandwidth, hit rate, error rate) are computed in real-time using Flink and exposed via a metrics API.

API Design

  • GET /v1/segments/{content_id}/{rendition}/{segment_number}.ts — Fetch an HLS segment; routed to nearest edge PoP; returns binary with Cache-Control headers
  • GET /v1/manifests/{content_id}/master.m3u8 — Fetch the HLS master playlist; includes signed segment URLs with time-limited tokens
  • POST /v1/purge — Purge content from all PoPs; body contains URL pattern or content_id; returns purge_id for tracking
  • GET /v1/analytics/realtime?pop_id={id}&metric=bandwidth&window=5m — Fetch real-time analytics per PoP

Scaling & Bottlenecks

The primary bottleneck is edge storage capacity during content catalog growth. The solution is intelligent cache eviction: each PoP runs a popularity-weighted LRU eviction policy. Content that has not been accessed in 24 hours and has low predicted future demand is evicted first. A feedback loop from the Content Popularity Predictor continuously adjusts the eviction policy per region — a Korean drama might be hot in Seoul but evictable in São Paulo. SSD write amplification is a secondary concern: video segments are large (1-4MB) and written sequentially, which aligns well with SSD write patterns.

Network capacity at edge PoPs is the other key bottleneck. Each PoP is provisioned with 2-4x expected peak bandwidth. When a PoP approaches capacity (80% utilization), the DNS routing layer begins deflecting new requests to nearby PoPs. This overflow routing is transparent to the viewer — latency increases slightly (20-50ms additional RTT) but avoids congestion-induced packet loss. For massive events (World Cup final), temporary PoP capacity is added by deploying additional servers or activating reserved ISP peering capacity.

Key Trade-offs

  • Four-tier hierarchy vs flat CDN: More tiers increase complexity but dramatically reduce origin load and improve cache hit rates — the origin shield alone reduces origin requests by 95%
  • Push-based fill vs pull-on-miss: Proactive content push ensures zero cache misses for predicted popular content, but wastes bandwidth and storage for mispredictions — hybrid approach pushes top 20% by predicted popularity, pulls the rest
  • Consistent hashing for segment routing: Maximizes per-server cache hit rate but creates hot spots when a viral segment is hashed to a single server — mitigated by replication factor of 2 for segments above a request rate threshold
  • Signed URLs vs referer-based access control: Signed URLs provide cryptographic access control but increase URL length and prevent sharing; referer checks are simpler but easily spoofed — signed URLs are the industry standard for DRM content

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.