SYSTEM_DESIGN

System Design: Reddit

System design of Reddit covering the post/comment data model, vote counting, feed ranking algorithms (Hot, Top, New), and scaling a link aggregation platform with 1.5 billion monthly visitors.

17 min readUpdated Jan 15, 2025
system-designredditvotingfeed-ranking

Requirements

Functional Requirements:

  • Users submit posts (links, text, images, videos) to subreddits
  • Upvote/downvote on posts and comments
  • Threaded comment system with unlimited nesting
  • Multiple feed sorts: Hot, Top, New, Rising, Controversial
  • Subreddit subscriptions drive a personalized home feed
  • Awards, flairs, and moderation tools

Non-Functional Requirements:

  • 1.5B monthly visitors; 57M DAU; 500M+ posts and comments
  • Vote counts should be eventually consistent (±few votes acceptable)
  • Hot ranking must refresh within 30 seconds of significant vote changes
  • 99.9% uptime; post/comment data must never be lost

Scale Estimation

57M DAU × 30 page views/day = 1.7B page views/day = 19,676 reads/sec. Vote writes: 57M users × 20 votes/day = 1.14B votes/day = 13,194 votes/sec. New posts: 2M posts/day = 23 posts/sec. Comments: 10M comments/day = 116 comments/sec. Top subreddit (r/AskReddit) has 45M subscribers — a viral post could trigger millions of upvotes in minutes. Storage: average post 5KB text, top posts with 10K comments = 100KB per post thread → 2M posts × 100KB = 200GB/day for post data.

High-Level Architecture

Reddit's architecture centers on the vote aggregation and ranking pipeline. The write path: a Vote Service receives upvote/downvote events, writes to a Vote Store (Cassandra, chosen for write-heavy workload and wide-column structure), and emits events to Kafka. A Ranking Computation Service consumes Kafka events and recomputes Hot scores for affected posts, updating a Rankings Cache (Redis sorted sets per subreddit).

The read path: a user loads r/worldnews → the Feed Service reads the top-200 post IDs from a Redis sorted set (subreddit:{id}:hot) → batch-fetches post metadata from the Post Service (backed by PostgreSQL) → assembles the feed response. The Comment Service uses a tree data structure stored in PostgreSQL with adjacency list + path enumeration hybrid (Materialized Path pattern) for efficient nested comment retrieval without recursive CTEs.

Reddit migrated from Python/Pylons monolith to a Go-based microservices architecture (2017-2022). The primary data stores are PostgreSQL (post and comment metadata), Cassandra (votes, sessions, activity), Redis (feed caches, rate limiting), and Kafka (event streaming). Media (images, videos) is stored on S3 via Reddit's Media Service.

Core Components

Vote Service and Score Computation

Voting is the most write-intensive workload. The Vote Service uses an idempotent upsert model: each (user_id, post_id) pair has at most one vote record. Votes are written to Cassandra with a partition key of post_id for fast aggregation. Vote counts are NOT computed by summing rows at read time — instead, a materialized view (maintained by Cassandra's built-in counter columns) tracks upvote_count and downvote_count per post. The score formula for Hot ranking is Wilson score lower bound applied to the upvote ratio, decay function applied over time since posting.

Feed Ranking Engine

Each subreddit maintains 5 Redis sorted sets: hot, top_day, top_week, top_month, top_all_time, and new. The Hot sorted set is the most dynamic — scores decay over time, so a post's position changes even without new votes. Reddit uses a background cron job (running every 60 seconds) to recompute Hot scores for the top 1,000 posts per active subreddit and update Redis. For viral posts (vote velocity > threshold), score recomputation is triggered in near real-time via Kafka consumer.

Comment Tree Service

Reddit's nested comments use a Materialized Path pattern in PostgreSQL: each comment stores its full ancestry path (e.g., /root/c1/c2/c3/) allowing efficient subtree queries with WHERE path LIKE '/root/c1/%'. Top-level and second-level comments are pre-fetched; deeper nesting is loaded lazily on expand. The comment tree is rendered depth-first in the client. Sort order within a subtree is configurable (best, top, new, controversial) and uses the same Wilson score formula as post ranking.

Database Design

Posts table in PostgreSQL: post_id (UUID), subreddit_id, author_id, title, body (TEXT), link_url, post_type (link/text/image/video), score (INTEGER, materialized), upvote_count, downvote_count, created_at, flair, is_nsfw, is_locked. Indexed on (subreddit_id, created_at) for New sort, (subreddit_id, score) for Hot/Top. The score column is updated via DB trigger on vote changes for small subreddits and via async job for large ones.

Cassandra stores the raw vote table: votes (partition_key: post_id, clustering_key: user_id, value: +1/-1, timestamp). This enables efficient 'did user X vote on post Y' lookups (required for rendering vote state in the UI) and batch score computation. A separate Redis cluster with 64 shards stores the ranking sorted sets. Redis AOF persistence is enabled to survive restarts without losing ranking data.

API Design

  • GET /r/{subreddit}/hot?limit=25&after={post_id} — Fetch hot-ranked posts for a subreddit with cursor pagination
  • POST /api/vote — Submit a vote; body: {id: post_id, dir: 1|-1|0, rank: position_in_feed}
  • GET /comments/{post_id}?sort=best&limit=200 — Fetch post with nested comments
  • POST /api/submit — Submit a new post; body: {kind: link|self, title, url_or_text, sr: subreddit_name}

Scaling & Bottlenecks

The vote storm problem: a post reaching the Reddit front page can receive 100K+ upvotes in minutes, generating 100K+ writes/sec to the Vote Service. Cassandra handles this through its write-optimized LSM tree storage. However, score recomputation bottlenecks on the ranking service — solved by batching: instead of recomputing on every vote, the Ranking Service buffers votes for 5 seconds and batch-recomputes. For extreme viral events, the Hot score computation is throttled to update every 30 seconds even under high vote load.

Database read scaling for hot posts uses a two-layer cache: Redis L1 (full post metadata, 5-minute TTL) and a CDN (HTML-rendered post pages, 60-second TTL). Cache stampedes on TTL expiry are handled with jittered TTL (±30 seconds random) and a 'stale-while-revalidate' pattern. PostgreSQL read replicas handle profile and subreddit info queries. For the comment tree, the top-level comments (depth ≤ 2) are cached in Redis; deeper comments are served from PostgreSQL replicas.

Key Trade-offs

  • Materialized score vs real-time aggregation: Pre-computing and storing the vote score avoids slow COUNT aggregations at read time — the trade-off is eventual consistency (score may lag by seconds during vote storms)
  • Wilson score over raw score: Wilson score (considers both vote count and ratio) prevents small posts with 100% upvotes from outranking posts with thousands of votes — better quality signal at the cost of harder-to-explain ranking
  • Materialized Path for comments: Enables efficient subtree queries but makes comment reparenting (moving a comment to a different parent) expensive — Reddit disallows this operation, avoiding the limitation
  • Redis sorted sets for feeds: O(log N) insertion and O(log N + M) range queries make sorted sets ideal for leaderboard-style ranking — the entire subreddit top-25 is a single Redis ZREVRANGE command

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.