SYSTEM_DESIGN

System Design: Content Moderation System

Comprehensive system design of a content moderation platform covering automated ML classifiers, human review queues, appeal workflows, and policy enforcement at scale for social media platforms.

17 min readUpdated Jan 15, 2025
system-designcontent-moderationmachine-learningtrust-safety

Requirements

Functional Requirements:

  • Automatically classify user-generated content (text, images, video) against policy violations
  • Route flagged content to human reviewers with priority queues
  • Support user reporting with categorized report reasons
  • Provide an appeals workflow for content creators whose content was actioned
  • Maintain an audit trail of all moderation decisions
  • Support configurable policy rules that can be updated without code deployment

Non-Functional Requirements:

  • Process 500M pieces of content per day with automated classification
  • Automated classification latency under 200ms for text, under 2 seconds for images/video
  • False positive rate below 1% for automated takedowns
  • 99.99% availability — moderation pipeline must never drop content from the queue
  • Human review SLA: high-severity content (CSAM, terrorism) reviewed within 15 minutes

Scale Estimation

500M content items per day breaks down to roughly 5,800 items/sec. Assuming 5% flagging rate from automated classifiers, 25M items/day enter the human review queue. With an average review time of 90 seconds per item, that requires approximately 6,500 human reviewers working 8-hour shifts. Each content item averages 500KB (text metadata + image/video reference), so the daily metadata throughput is 250TB. The ML inference fleet needs to handle 5,800 requests/sec with separate models for text (BERT-based), image (ResNet/EfficientNet), and video (frame sampling + image classification). GPU inference servers: approximately 200 A100 GPUs for the combined model fleet.

High-Level Architecture

The moderation system operates as an asynchronous pipeline triggered by content creation events. When a user posts content, the Content Service emits a Kafka event to a content-created topic. A Moderation Router Service consumes these events and dispatches to the appropriate classification pipeline based on content type. Text goes to a Text Classification Service (BERT-based model served via Triton Inference Server on GPU nodes). Images go to an Image Classification Service (EfficientNet-B7 model). Video content is processed by a Video Analysis Service that extracts keyframes at 1fps, classifies each frame, and runs audio transcription through Whisper for text-based analysis of spoken content.

Each classifier returns a confidence score per policy category (hate speech, nudity, violence, spam, misinformation, self-harm). A Decision Engine applies configurable policy thresholds: content above the auto-remove threshold (e.g., >0.95 confidence for CSAM) is immediately removed and logged. Content in the gray zone (0.6-0.95 confidence) is enqueued in a Human Review Queue backed by a priority queue in Redis. Content below 0.6 confidence is published. A separate User Reporting Service allows users to flag content, which creates entries in the same review queue with lower priority.

Core Components

ML Classification Pipeline

The classification pipeline uses a multi-model ensemble. For text, a fine-tuned BERT model classifies against 15 policy violation categories. For images, an EfficientNet-B7 model trained on internal labeled datasets classifies nudity, violence, and hate symbols. A separate perceptual hashing service (using PhotoDNA for CSAM and pHash for known violating content) compares image hashes against a database of known bad content — this provides near-instant detection of previously identified material. Models are retrained weekly on a feedback loop from human reviewer decisions, deployed via a shadow deployment pattern where the new model runs alongside the production model for 24 hours before promotion.

Human Review Queue Manager

The review queue is implemented as a multi-tier priority system. Priority 1 (immediate): CSAM, terrorism, imminent threats — routed to specialized reviewers with security clearances. Priority 2 (4-hour SLA): hate speech, severe harassment, dangerous misinformation. Priority 3 (24-hour SLA): nudity, spam, minor policy violations. The queue manager uses a Redis sorted set per priority tier with score = insertion_timestamp. A Task Assignment Service distributes items to available reviewers using a round-robin assignment with specialization matching — reviewers trained on hate speech get hate speech items. Each assignment has a lock with a 10-minute TTL; if the reviewer doesn't submit a decision, the item is re-queued.

Appeals Workflow Engine

When a user appeals a moderation decision, the appeal enters a separate queue reviewed by senior moderators. The Appeal Service retrieves the original content, the automated classification scores, the initial reviewer's decision, and any user-provided context. Senior reviewers can overturn, uphold, or escalate to a policy team. Appeal decisions are fed back into the ML training pipeline as high-value labeled examples. The system tracks appeal overturn rates per reviewer and per model category to identify systematic errors.

Database Design

Moderation decisions are stored in a PostgreSQL cluster (sharded by content_id) with a schema: content_id (BIGINT), content_type (ENUM), classification_scores (JSONB), automated_decision (ENUM: approved/flagged/removed), human_decision (ENUM), reviewer_id, decision_timestamp, policy_categories (TEXT[]), appeal_status (ENUM). A separate Elasticsearch cluster indexes moderation logs for full-text search by policy teams investigating patterns. The known-bad-content hash database uses a bloom filter (1 billion hashes, 0.01% false positive rate) in Redis for O(1) lookups, backed by a PostgreSQL table for persistence.

Audit logs are immutable and stored in an append-only PostgreSQL table with row-level security. Every action (classification, review, appeal, policy change) is logged with actor, timestamp, and before/after state. These logs are replicated to a data warehouse (BigQuery) for compliance reporting and regulatory audits. Retention period is 7 years per regulatory requirements.

API Design

  • POST /api/v1/moderate — Submit content for moderation; accepts content_id, content_type, content_payload; returns classification result and action taken
  • GET /api/v1/review-queue?priority={tier}&limit=10 — Fetch next batch of items for human review; returns items with content, classification scores, and context
  • POST /api/v1/review/{content_id}/decision — Submit reviewer decision; body contains action (approve/remove/restrict), reason_code, and notes
  • POST /api/v1/appeals — Create an appeal; body contains content_id, appeal_reason; returns appeal_id and estimated review time

Scaling & Bottlenecks

The ML inference pipeline is the primary compute bottleneck. Image classification on EfficientNet-B7 takes approximately 15ms per image on an A100 GPU with batch inference (batch size 32). At 5,800 items/sec with 60% being images, that is 3,480 image classifications/sec requiring approximately 110 GPU inference instances (with headroom). Triton Inference Server handles dynamic batching — requests arriving within a 5ms window are batched together, increasing GPU utilization from 40% to 85%. Model serving uses Kubernetes with GPU node pools and horizontal pod autoscaling based on request queue depth.

The human review queue can become a bottleneck during viral events when flagging rates spike from 5% to 20%. Auto-scaling the review workforce is impossible (humans cannot be spun up like containers), so the system implements triage: during surge events, the auto-remove threshold is lowered from 0.95 to 0.90 confidence, and the review queue prioritizes content by reach (items with more views or from accounts with more followers are reviewed first). A circuit breaker pauses low-priority reviews entirely during extreme surges.

Key Trade-offs

  • Automated removal vs. human review: Auto-removing at high confidence (>0.95) catches the worst content instantly but risks false positives — the 1% error rate at 500M items/day means 50K wrongly removed items daily
  • Perceptual hashing vs. ML classification: Hash matching is O(1) and deterministic for known bad content but cannot detect novel violations — both are needed as complementary layers
  • Reviewer specialization vs. generalization: Specialized reviewers are more accurate on their category but create queue imbalances — a hybrid model with primary specialization and secondary fallback categories balances throughput and accuracy
  • Real-time moderation vs. post-publication review: Pre-publication moderation adds latency to the posting experience; post-publication review allows harmful content to be visible briefly — most platforms use post-publication with fast automated removal

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.