System Design: News Feed System

Requirements

Functional Requirements:

Users can create posts (text, images, links)
Users follow other users; followers see posts in their feed
Feed is sorted by recency or ranked by relevance
Users can like and comment on posts
Feed supports infinite scroll with pagination
Push notifications for new posts from close connections

Non-Functional Requirements:

100M DAU; average user follows 300 accounts
Feed load latency under 200ms (p99)
Posts must be durable; eventual consistency for feed ordering is acceptable
99.99% availability; feed service must tolerate partial backend failures gracefully

Scale Estimation

100M DAU × 10 feed loads/day = 1B feed reads/day = 11,574 reads/sec. Post writes: 100M users × 2 posts/day = 200M posts/day = 2,315 posts/sec. Fan-out writes: each post triggers writes to 300 follower feeds on average → 2,315 × 300 = 694,500 fan-out writes/sec peak. For a user with 1M followers, a single post triggers 1M writes. Storage: 200M posts/day × 1KB average = 200GB/day for post metadata.

High-Level Architecture

The news feed system separates write and read paths with a fan-out layer in between. The write path: User creates post → Post Service writes to Post DB (MySQL/PostgreSQL, sharded by user_id) → emits PostCreated event to Kafka → Fan-Out Worker picks up the event, reads the poster's follower list from the Social Graph Service, and writes the post_id into each follower's feed cache in Redis.

The read path: User requests feed → Feed Service reads the top-20 post IDs from feed:{user_id} Redis list → batch-fetches post content from Post Service → fetches user metadata from User Service cache → assembles and returns the feed. The Feed Service is stateless and horizontally scalable behind a load balancer. Rate limiting and authentication are handled by an API Gateway layer (Kong or custom).

For celebrity/high-follower accounts (>100K followers), a hybrid fan-out strategy is used: their posts are NOT pushed to follower feeds. Instead, at read time, the Feed Service fetches recent posts from celebrity accounts the user follows directly from the Post Service, and merges them with the precomputed Redis feed. This bounded list of celebrity accounts per user is stored in a Celebrity Follow Cache.

Core Components

Fan-Out Worker

The Fan-Out Worker is a Kafka consumer group. Each consumer picks up a PostCreated event, looks up the poster's follower list from the Social Graph Service (backed by Cassandra), and writes the post_id into each follower's Redis feed list using LPUSH feed:{follower_id} {post_id} followed by LTRIM feed:{follower_id} 0 999 to keep feeds bounded at 1,000 items. Workers are partitioned by poster_user_id to maintain ordering. The worker fleet autoscales on Kafka consumer lag — during peak hours, 10x more workers may be needed.

Feed Service

The Feed Service implements the read path. It retrieves post IDs from Redis, hydrates them via the Post Service (with local in-process caching for hot posts), and merges in celebrity posts fetched from the Celebrity Fast Path. Feed pagination uses a cursor encoding the last post_id seen, not a numeric offset, to handle concurrent insertions correctly. If the Redis feed is empty (cold user), the Feed Service falls back to a 'timeline reconstruction' query against the Post DB.

Ranking Service

For relevance-ranked feeds, a Ranking Service runs a lightweight gradient boosting model over candidate posts. Features include: poster affinity score (how often the viewer engages with this poster), post freshness (exponential decay), engagement velocity (likes + comments in first 5 minutes), and content type preference (text vs image vs video). The ranked feed is cached in Redis with the user_id and a version number; background workers invalidate the cache when new posts arrive from close connections.

Database Design

Posts table (PostgreSQL, sharded by user_id): post_id (BIGINT Snowflake), user_id, content (TEXT), media_urls (JSON array), created_at, like_count (INT), comment_count (INT). Indexed on (user_id, created_at DESC) for profile queries. Likes table: (post_id, user_id, created_at) — Cassandra for write throughput. Comments: recursive structure in PostgreSQL with adjacency list (parent_comment_id FK). Social graph (follows): Cassandra with partition key follower_id, clustering key followee_id — enables 'who does user X follow' O(1) partition scan.

Redis feed lists store post_ids (8 bytes each) capped at 1,000 → 8KB per user feed. 100M users × 8KB = 800GB RAM for all feed caches — fits in a Redis cluster of 10 × 128GB nodes with 50% overhead. Redis cluster uses consistent hashing with virtual nodes; replication factor 2 for durability.

API Design

GET /v1/feed?limit=20&cursor={post_id} — Fetch the authenticated user's news feed
POST /v1/posts — Create a new post; returns post_id and triggers async fan-out
POST /v1/posts/{post_id}/likes — Like a post; idempotent upsert
GET /v1/users/{user_id}/posts?limit=20&cursor={post_id} — Fetch a specific user's profile posts

Scaling & Bottlenecks

The fan-out write amplification is the primary scaling challenge. At 2,315 posts/sec with 300 average followers, fan-out generates 694,500 Redis writes/sec under normal load. The Kafka consumer lag metric drives autoscaling. For bursty events (celebrity posting at peak), the fan-out queue may grow — Kafka retains messages for 24 hours, so the system recovers gracefully with lag. Priority queuing (high-affinity followers get fan-out first) ensures close friends see posts before distant followers.

The Post DB (PostgreSQL) is scaled with a write leader + 3 read replicas per shard. Shards are split when a shard exceeds 500GB or 5K writes/sec. A connection pooler (PgBouncer) sits in front of each shard to limit connection overhead. For global deployments, a multi-region active-active setup uses CRDTs for like counts and a last-write-wins strategy for post edits. The API Gateway handles geo-routing to the nearest regional cluster.

Key Trade-offs

Fan-out on write vs read: Write fan-out (push) gives O(1) feed reads but O(followers) writes per post; pull gives O(1) writes but O(followed_accounts) reads per feed load — hybrid approach balances both based on follower count threshold
Redis lists vs sorted sets for feeds: Lists (LPUSH/LRANGE) are simpler and faster for recency-sorted feeds; sorted sets (ZADD/ZREVRANGE) are needed for relevance-ranked feeds — choose based on ranking requirement
Bounded feed cache (1,000 items): Capping feed length limits memory and keeps read latency bounded; users scrolling past 1,000 items (rare) fall back to a slower DB query — acceptable UX trade-off
Eventual consistency for like counts: Buffering likes in Redis with periodic flush to PostgreSQL handles viral post write storms but means displayed counts may lag by seconds — universally acceptable in social feed contexts