System Design: Quiz & Assessment Platform

Requirements

Functional Requirements:

Educators create quizzes with multiple question types: MCQ, short answer, fill-in-the-blank, code snippets, and drag-and-drop ordering
Students take timed or untimed quizzes with auto-submission on timer expiry
Platform supports adaptive testing: question difficulty adjusts based on prior answers (Item Response Theory)
Auto-grading for objective questions; manual review queue for subjective answers
Real-time quiz mode (Kahoot-style) where all participants answer simultaneously with live score display
Detailed analytics: per-question difficulty, average completion time, score distributions

Non-Functional Requirements:

Support 500k simultaneous quiz-takers during peak exam windows
Answer submission latency under 100ms at p99
Quiz state must survive server restarts — no answer loss
Anti-cheating: detect tab switching, copy-paste, and multiple device logins
Question bank must support 10 million questions with sub-100ms random selection

Scale Estimation

During a university exam window (9 AM Monday), 500k students simultaneously taking a 60-question exam over 90 minutes submit ~333 answers/student, totaling 166M answer submissions in 90 minutes — roughly 31k writes/second sustained. Each answer payload is ~200 bytes, so 6 MB/second of write throughput. For real-time quiz mode (Kahoot-style), 100k concurrent participants in a single quiz session require a pub/sub fanout of 100k messages per question reveal — a WebSocket broadcast workload. Question bank reads: 31k quiz starts/minute each fetching 60 questions = 31M question reads/minute = ~520k reads/second, requiring aggressive caching.

High-Level Architecture

The platform separates quiz authoring (low traffic, high complexity), quiz delivery (high traffic, simple reads), and result processing (batch + real-time) into independent services. The quiz authoring service is a standard CRUD API backed by PostgreSQL, used by educators to create questions, assemble quizzes, and configure settings (time limits, shuffling, anti-cheat options). Questions are stored in a normalized question bank with full-text search via Elasticsearch. Quiz delivery is a read-heavy service: when a student starts a quiz, a snapshot of the quiz (questions, options, settings) is materialized and cached in Redis with a TTL equal to the quiz duration plus a buffer. All subsequent answer submissions reference this cached session.

Answer submissions flow through a write-optimized path: the answer API writes to a Kafka topic (durable, ordered, replicated) and returns 200 immediately. A stream processor (Flink) consumes the topic to update real-time leaderboards (Redis Sorted Sets) and detect anomalies (e.g., suspiciously fast completion). Answers are batch-persisted to PostgreSQL every 5 seconds via a bulk COPY operation. This two-phase write (Kafka for durability, PostgreSQL for query) ensures no answer is lost even if the DB write path has transient failures.

Core Components

Adaptive Testing Engine

The adaptive testing engine implements a simplified Item Response Theory (IRT) model. Each question has a difficulty parameter (b) calibrated from historical response data. After each answer, the engine re-estimates the student's ability level (theta) using a maximum likelihood estimator. The next question is selected from the question bank to maximize information at the current theta estimate — typically the question whose difficulty parameter is closest to the current theta. This selection runs in O(1) using a pre-built difficulty index (Redis Sorted Set keyed by difficulty score). The engine terminates when the standard error of the theta estimate drops below a threshold or the maximum question count is reached.

Anti-Cheat Detection Service

Anti-cheat operates at two levels: client-side signals and server-side anomaly detection. Client-side, the quiz JavaScript SDK captures: tab/window visibility change events, clipboard access attempts, full-screen exit events, and mouse-leave events (potential screen sharing). These signals are sent as a separate heartbeat stream to the anti-cheat service (not mixed with the answer submission path). Server-side, the anomaly detector flags: answer times below the minimum plausible reading time for the question text, IP address shared with another active session, identical answer sequences across students (potential collusion), and submissions after timer expiry. Flagged sessions are queued for manual educator review.

Real-Time Quiz Broadcast Service

For Kahoot-style synchronous quizzes, a dedicated WebSocket service manages room-based connections. Each quiz session is a "room" — the host (educator) controls question reveal timing, and all participants in the room receive question reveals and see live scores. WebSocket servers are stateless; room state (current question index, participant scores) lives in Redis. When a host advances to the next question, the host service writes the new question to Redis pub/sub; all WebSocket servers subscribed to that room's channel fan out the message to their connected participants. With 100k participants/room and 10k rooms, the pub/sub fanout is the scaling limit — Redis Cluster with separate channels per room handles this at ~1M messages/second throughput.

Database Design

PostgreSQL schema: questions (question_id, question_bank_id, type, body, difficulty, tags[], created_by), answer_options (option_id, question_id, body, is_correct, order_index), quizzes (quiz_id, title, config_json, created_by, published_at), quiz_questions (quiz_id, question_id, order_index), quiz_sessions (session_id, quiz_id, student_id, started_at, submitted_at, score), answers (answer_id, session_id, question_id, selected_option_id, response_text, answered_at, time_spent_ms). The answers table is partitioned by answered_at month. A separate ClickHouse table stores pre-aggregated analytics (per-question response distributions, average time_spent) refreshed hourly from PostgreSQL.

API Design

POST /quiz-sessions — body: {quiz_id}, initializes a session, returns session_id and first question batch; quiz snapshot cached in Redis
POST /quiz-sessions/{session_id}/answers — body: {question_id, selected_option_id, time_spent_ms}, writes to Kafka, returns {correct: true/false} for immediate feedback
POST /quiz-sessions/{session_id}/submit — finalizes session, triggers score computation, returns final score and result breakdown
GET /analytics/quizzes/{quiz_id} — returns per-question stats (difficulty index, discrimination index, avg time), score distribution histogram

Scaling & Bottlenecks

The 31k writes/second answer submission rate requires the Kafka-first write path — a direct PostgreSQL write at this rate would require a 32-node cluster with careful partitioning. Kafka absorbs the burst and smooths writes to PostgreSQL via micro-batch inserts. The quiz session cache in Redis is critical: at quiz start, 500k students each trigger a cache miss if the quiz snapshot isn't pre-populated. Pre-warming quiz snapshots 5 minutes before the scheduled start time eliminates the thundering herd problem. Redis memory usage: 500k active sessions × 10 KB/session = 5 GB, manageable on a mid-sized Redis cluster.

Real-time leaderboard updates in Kahoot-style quizzes are a fan-out bottleneck: 100k participants each submitting an answer within 10 seconds means 10k score updates/second, each triggering a ZADD to Redis and a broadcast to all participants. Batching score broadcasts (push leaderboard snapshot every 500ms instead of per-answer) reduces WebSocket message rate by 100x without perceptible UX degradation.

Key Trade-offs

Immediate correctness feedback vs. anti-cheat: Returning {correct: true/false} immediately after each answer aids learning but allows students to brute-force MCQs by submitting all options; disabling immediate feedback during graded exams prevents this at the cost of a poorer learning experience.
Adaptive testing complexity vs. fairness: IRT-based adaptive selection maximizes measurement precision but makes it impossible for two students to compare their exact question sets, complicating fairness audits; a hybrid approach (fixed item pool with adaptive ordering) balances these concerns.
Kafka durability vs. latency: Kafka's default ack=all with 3 replicas adds ~5ms to the answer submission path; ack=1 reduces latency to ~1ms but risks answer loss on broker failure. For graded exams, ack=all is non-negotiable.
Client-side anti-cheat vs. privacy: Capturing tab-switch and clipboard events requires explicit user consent in GDPR jurisdictions; overly aggressive monitoring creates legal risk and student backlash.