System Design: Content Moderation System

Requirements

Functional Requirements:

Automatically classify user-generated content (text, images, video) against policy violations
Route flagged content to human reviewers with priority queues
Support user reporting with categorized report reasons
Provide an appeals workflow for content creators whose content was actioned
Maintain an audit trail of all moderation decisions
Support configurable policy rules that can be updated without code deployment

Non-Functional Requirements:

Process 500M pieces of content per day with automated classification
Automated classification latency under 200ms for text, under 2 seconds for images/video
False positive rate below 1% for automated takedowns
99.99% availability — moderation pipeline must never drop content from the queue
Human review SLA: high-severity content (CSAM, terrorism) reviewed within 15 minutes

Scale Estimation

500M content items per day breaks down to roughly 5,800 items/sec. Assuming 5% flagging rate from automated classifiers, 25M items/day enter the human review queue. With an average review time of 90 seconds per item, that requires approximately 6,500 human reviewers working 8-hour shifts. Each content item averages 500KB (text metadata + image/video reference), so the daily metadata throughput is 250TB. The ML inference fleet needs to handle 5,800 requests/sec with separate models for text (BERT-based), image (ResNet/EfficientNet), and video (frame sampling + image classification). GPU inference servers: approximately 200 A100 GPUs for the combined model fleet.

High-Level Architecture

The moderation system operates as an asynchronous pipeline triggered by content creation events. When a user posts content, the Content Service emits a Kafka event to a content-created topic. A Moderation Router Service consumes these events and dispatches to the appropriate classification pipeline based on content type. Text goes to a Text Classification Service (BERT-based model served via Triton Inference Server on GPU nodes). Images go to an Image Classification Service (EfficientNet-B7 model). Video content is processed by a Video Analysis Service that extracts keyframes at 1fps, classifies each frame, and runs audio transcription through Whisper for text-based analysis of spoken content.

Each classifier returns a confidence score per policy category (hate speech, nudity, violence, spam, misinformation, self-harm). A Decision Engine applies configurable policy thresholds: content above the auto-remove threshold (e.g., >0.95 confidence for CSAM) is immediately removed and logged. Content in the gray zone (0.6-0.95 confidence) is enqueued in a Human Review Queue backed by a priority queue in Redis. Content below 0.6 confidence is published. A separate User Reporting Service allows users to flag content, which creates entries in the same review queue with lower priority.

Core Components

ML Classification Pipeline

The classification pipeline uses a multi-model ensemble. For text, a fine-tuned BERT model classifies against 15 policy violation categories. For images, an EfficientNet-B7 model trained on internal labeled datasets classifies nudity, violence, and hate symbols. A separate perceptual hashing service (using PhotoDNA for CSAM and pHash for known violating content) compares image hashes against a database of known bad content — this provides near-instant detection of previously identified material. Models are retrained weekly on a feedback loop from human reviewer decisions, deployed via a shadow deployment pattern where the new model runs alongside the production model for 24 hours before promotion.

Human Review Queue Manager

The review queue is implemented as a multi-tier priority system. Priority 1 (immediate): CSAM, terrorism, imminent threats — routed to specialized reviewers with security clearances. Priority 2 (4-hour SLA): hate speech, severe harassment, dangerous misinformation. Priority 3 (24-hour SLA): nudity, spam, minor policy violations. The queue manager uses a Redis sorted set per priority tier with score = insertion_timestamp. A Task Assignment Service distributes items to available reviewers using a round-robin assignment with specialization matching — reviewers trained on hate speech get hate speech items. Each assignment has a lock with a 10-minute TTL; if the reviewer doesn't submit a decision, the item is re-queued.

Appeals Workflow Engine

When a user appeals a moderation decision, the appeal enters a separate queue reviewed by senior moderators. The Appeal Service retrieves the original content, the automated classification scores, the initial reviewer's decision, and any user-provided context. Senior reviewers can overturn, uphold, or escalate to a policy team. Appeal decisions are fed back into the ML training pipeline as high-value labeled examples. The system tracks appeal overturn rates per reviewer and per model category to identify systematic errors.

Database Design

Moderation decisions are stored in a PostgreSQL cluster (sharded by content_id) with a schema: content_id (BIGINT), content_type (ENUM), classification_scores (JSONB), automated_decision (ENUM: approved/flagged/removed), human_decision (ENUM), reviewer_id, decision_timestamp, policy_categories (TEXT[]), appeal_status (ENUM). A separate Elasticsearch cluster indexes moderation logs for full-text search by policy teams investigating patterns. The known-bad-content hash database uses a bloom filter (1 billion hashes, 0.01% false positive rate) in Redis for O(1) lookups, backed by a PostgreSQL table for persistence.

Audit logs are immutable and stored in an append-only PostgreSQL table with row-level security. Every action (classification, review, appeal, policy change) is logged with actor, timestamp, and before/after state. These logs are replicated to a data warehouse (BigQuery) for compliance reporting and regulatory audits. Retention period is 7 years per regulatory requirements.

API Design

POST /api/v1/moderate — Submit content for moderation; accepts content_id, content_type, content_payload; returns classification result and action taken
GET /api/v1/review-queue?priority={tier}&limit=10 — Fetch next batch of items for human review; returns items with content, classification scores, and context
POST /api/v1/review/{content_id}/decision — Submit reviewer decision; body contains action (approve/remove/restrict), reason_code, and notes
POST /api/v1/appeals — Create an appeal; body contains content_id, appeal_reason; returns appeal_id and estimated review time

Scaling & Bottlenecks

The ML inference pipeline is the primary compute bottleneck. Image classification on EfficientNet-B7 takes approximately 15ms per image on an A100 GPU with batch inference (batch size 32). At 5,800 items/sec with 60% being images, that is 3,480 image classifications/sec requiring approximately 110 GPU inference instances (with headroom). Triton Inference Server handles dynamic batching — requests arriving within a 5ms window are batched together, increasing GPU utilization from 40% to 85%. Model serving uses Kubernetes with GPU node pools and horizontal pod autoscaling based on request queue depth.

The human review queue can become a bottleneck during viral events when flagging rates spike from 5% to 20%. Auto-scaling the review workforce is impossible (humans cannot be spun up like containers), so the system implements triage: during surge events, the auto-remove threshold is lowered from 0.95 to 0.90 confidence, and the review queue prioritizes content by reach (items with more views or from accounts with more followers are reviewed first). A circuit breaker pauses low-priority reviews entirely during extreme surges.

Key Trade-offs

Automated removal vs. human review: Auto-removing at high confidence (>0.95) catches the worst content instantly but risks false positives — the 1% error rate at 500M items/day means 50K wrongly removed items daily
Perceptual hashing vs. ML classification: Hash matching is O(1) and deterministic for known bad content but cannot detect novel violations — both are needed as complementary layers
Reviewer specialization vs. generalization: Specialized reviewers are more accurate on their category but create queue imbalances — a hybrid model with primary specialization and secondary fallback categories balances throughput and accuracy
Real-time moderation vs. post-publication review: Pre-publication moderation adds latency to the posting experience; post-publication review allows harmful content to be visible briefly — most platforms use post-publication with fast automated removal