SYSTEM_DESIGN
System Design: News Feed System
Generic system design for a scalable news feed, covering fan-out strategies, feed ranking, pull vs push architectures, and how to handle celebrity accounts efficiently.
Requirements
Functional Requirements:
- Users can create posts (text, images, links)
- Users follow other users; followers see posts in their feed
- Feed is sorted by recency or ranked by relevance
- Users can like and comment on posts
- Feed supports infinite scroll with pagination
- Push notifications for new posts from close connections
Non-Functional Requirements:
- 100M DAU; average user follows 300 accounts
- Feed load latency under 200ms (p99)
- Posts must be durable; eventual consistency for feed ordering is acceptable
- 99.99% availability; feed service must tolerate partial backend failures gracefully
Scale Estimation
100M DAU × 10 feed loads/day = 1B feed reads/day = 11,574 reads/sec. Post writes: 100M users × 2 posts/day = 200M posts/day = 2,315 posts/sec. Fan-out writes: each post triggers writes to 300 follower feeds on average → 2,315 × 300 = 694,500 fan-out writes/sec peak. For a user with 1M followers, a single post triggers 1M writes. Storage: 200M posts/day × 1KB average = 200GB/day for post metadata.
High-Level Architecture
The news feed system separates write and read paths with a fan-out layer in between. The write path: User creates post → Post Service writes to Post DB (MySQL/PostgreSQL, sharded by user_id) → emits PostCreated event to Kafka → Fan-Out Worker picks up the event, reads the poster's follower list from the Social Graph Service, and writes the post_id into each follower's feed cache in Redis.
The read path: User requests feed → Feed Service reads the top-20 post IDs from feed:{user_id} Redis list → batch-fetches post content from Post Service → fetches user metadata from User Service cache → assembles and returns the feed. The Feed Service is stateless and horizontally scalable behind a load balancer. Rate limiting and authentication are handled by an API Gateway layer (Kong or custom).
For celebrity/high-follower accounts (>100K followers), a hybrid fan-out strategy is used: their posts are NOT pushed to follower feeds. Instead, at read time, the Feed Service fetches recent posts from celebrity accounts the user follows directly from the Post Service, and merges them with the precomputed Redis feed. This bounded list of celebrity accounts per user is stored in a Celebrity Follow Cache.
Core Components
Fan-Out Worker
The Fan-Out Worker is a Kafka consumer group. Each consumer picks up a PostCreated event, looks up the poster's follower list from the Social Graph Service (backed by Cassandra), and writes the post_id into each follower's Redis feed list using LPUSH feed:{follower_id} {post_id} followed by LTRIM feed:{follower_id} 0 999 to keep feeds bounded at 1,000 items. Workers are partitioned by poster_user_id to maintain ordering. The worker fleet autoscales on Kafka consumer lag — during peak hours, 10x more workers may be needed.
Feed Service
The Feed Service implements the read path. It retrieves post IDs from Redis, hydrates them via the Post Service (with local in-process caching for hot posts), and merges in celebrity posts fetched from the Celebrity Fast Path. Feed pagination uses a cursor encoding the last post_id seen, not a numeric offset, to handle concurrent insertions correctly. If the Redis feed is empty (cold user), the Feed Service falls back to a 'timeline reconstruction' query against the Post DB.
Ranking Service
For relevance-ranked feeds, a Ranking Service runs a lightweight gradient boosting model over candidate posts. Features include: poster affinity score (how often the viewer engages with this poster), post freshness (exponential decay), engagement velocity (likes + comments in first 5 minutes), and content type preference (text vs image vs video). The ranked feed is cached in Redis with the user_id and a version number; background workers invalidate the cache when new posts arrive from close connections.
Database Design
Posts table (PostgreSQL, sharded by user_id): post_id (BIGINT Snowflake), user_id, content (TEXT), media_urls (JSON array), created_at, like_count (INT), comment_count (INT). Indexed on (user_id, created_at DESC) for profile queries. Likes table: (post_id, user_id, created_at) — Cassandra for write throughput. Comments: recursive structure in PostgreSQL with adjacency list (parent_comment_id FK). Social graph (follows): Cassandra with partition key follower_id, clustering key followee_id — enables 'who does user X follow' O(1) partition scan.
Redis feed lists store post_ids (8 bytes each) capped at 1,000 → 8KB per user feed. 100M users × 8KB = 800GB RAM for all feed caches — fits in a Redis cluster of 10 × 128GB nodes with 50% overhead. Redis cluster uses consistent hashing with virtual nodes; replication factor 2 for durability.
API Design
GET /v1/feed?limit=20&cursor={post_id}— Fetch the authenticated user's news feedPOST /v1/posts— Create a new post; returns post_id and triggers async fan-outPOST /v1/posts/{post_id}/likes— Like a post; idempotent upsertGET /v1/users/{user_id}/posts?limit=20&cursor={post_id}— Fetch a specific user's profile posts
Scaling & Bottlenecks
The fan-out write amplification is the primary scaling challenge. At 2,315 posts/sec with 300 average followers, fan-out generates 694,500 Redis writes/sec under normal load. The Kafka consumer lag metric drives autoscaling. For bursty events (celebrity posting at peak), the fan-out queue may grow — Kafka retains messages for 24 hours, so the system recovers gracefully with lag. Priority queuing (high-affinity followers get fan-out first) ensures close friends see posts before distant followers.
The Post DB (PostgreSQL) is scaled with a write leader + 3 read replicas per shard. Shards are split when a shard exceeds 500GB or 5K writes/sec. A connection pooler (PgBouncer) sits in front of each shard to limit connection overhead. For global deployments, a multi-region active-active setup uses CRDTs for like counts and a last-write-wins strategy for post edits. The API Gateway handles geo-routing to the nearest regional cluster.
Key Trade-offs
- Fan-out on write vs read: Write fan-out (push) gives O(1) feed reads but O(followers) writes per post; pull gives O(1) writes but O(followed_accounts) reads per feed load — hybrid approach balances both based on follower count threshold
- Redis lists vs sorted sets for feeds: Lists (LPUSH/LRANGE) are simpler and faster for recency-sorted feeds; sorted sets (ZADD/ZREVRANGE) are needed for relevance-ranked feeds — choose based on ranking requirement
- Bounded feed cache (1,000 items): Capping feed length limits memory and keeps read latency bounded; users scrolling past 1,000 items (rare) fall back to a slower DB query — acceptable UX trade-off
- Eventual consistency for like counts: Buffering likes in Redis with periodic flush to PostgreSQL handles viral post write storms but means displayed counts may lag by seconds — universally acceptable in social feed contexts
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.