System Design: Learning Management System (LMS)

Requirements

Functional Requirements:

Institutions (universities, companies) manage courses, rosters, and curricula within isolated tenant environments
Instructors create and organize content modules: readings, videos, assignments, discussions, and quizzes
Grading workflows: assignment submission, rubric-based grading, grade override, and gradebook export
SCORM and xAPI compliance for importing third-party eLearning content packages
Calendar integration, assignment due-date reminders, and announcement broadcasting
Reporting dashboards for administrators: course completion rates, at-risk student identification, time-on-task

Non-Functional Requirements:

Multi-tenant architecture with data isolation between institutions
Support 10,000 institutions, each with up to 100,000 users
Gradebook queries (fetch all grades for a course of 500 students) must return in under 500ms
SCORM content packages up to 2 GB must be processable
FERPA/GDPR compliance: audit logging of all data access to student records

Scale Estimation

With 10,000 institutions and an average of 10,000 active users each, the platform serves 100M registered users. At any given time, 5% are active (5M concurrent), each generating 1-2 API requests per minute. That's ~100k requests/second. The gradebook is the most query-intensive component: a professor loading grades for a 500-student course with 20 assignments triggers a 10,000-cell read. With 10k such queries/hour at peak, gradebook read throughput is ~28M grade cells/hour. Content storage: average 500 MB of course content per course, 1M active courses = 500 TB of content to store and serve.

High-Level Architecture

The LMS uses a multi-tenant architecture with tenant-per-schema isolation in PostgreSQL. Each institution gets its own PostgreSQL schema (namespace), providing logical isolation without the overhead of per-tenant database clusters. A schema-routing middleware layer maps incoming requests (authenticated with JWT containing tenant_id) to the correct schema. This approach scales to 10k tenants on a shared PostgreSQL cluster with connection pooling (PgBouncer) and allows tenant-specific schema migrations without affecting other tenants.

Content delivery is handled separately from the transactional LMS database. Course files, videos, and SCORM packages are stored in S3, organized by tenant prefix. A content delivery service generates pre-signed S3 URLs (valid for 1 hour) for authorized requests, avoiding the need to proxy large files through the application tier. SCORM packages are unpacked server-side into a virtual filesystem structure in S3 and served via a SCORM runtime JavaScript SDK in the browser, which communicates learner state back to an xAPI endpoint (Learning Record Store).

The reporting and analytics plane is completely separated from the operational LMS. An ETL pipeline (Airflow + dbt) copies transactional data to a Redshift data warehouse nightly. Administrators query pre-computed materialized views for completion rates and at-risk reports rather than running expensive aggregations against the operational PostgreSQL. Real-time at-risk alerts (students who haven't logged in for 7 days, grade below 60%) are generated by a scheduled Kafka Streams job.

Core Components

Gradebook Service

The gradebook is a specialized service separate from the general content API. It manages a sparse matrix of (student × assignment) grade cells. Internally, grades are stored in a column-family structure: a PostgreSQL table with columns for each assignment is impractical for dynamic assignment counts, so grades are stored as rows in an assignments_grades table with (course_id, user_id, assignment_id, score, graded_at, graded_by). For the common query pattern (fetch all grades for a course), a materialized view pivots rows to columns and is refreshed on each grade change via a PostgreSQL trigger. Grade exports to CSV and SIS (Banner, PowerSchool) use a background job to avoid blocking the API.

SCORM/xAPI Compliance Layer

SCORM 1.2 and 2004 packages are uploaded as ZIP archives, validated against the SCORM manifest schema (imsmanifest.xml), and unpacked into S3. The SCORM runtime in the browser communicates via the SCORM JavaScript API (LMSInitialize, LMSSetValue, LMSCommit). The LMS provides a server-side SCORM API endpoint that receives these calls, stores learner state (completion status, score, suspend data) in PostgreSQL, and translates them to xAPI statements for the Learning Record Store. xAPI statements are stored in a dedicated LRS (Learning Locker or SCORM Cloud) that provides a standardized query API.

Notification and Announcement Service

Instructors publish announcements to courses, which fan out to all enrolled students. With 10k students in a large university course, a single announcement triggers 10k notifications. The notification service uses a fan-out-on-write model: on announcement publish, a Kafka topic receives the event; a consumer expands the enrollment list and enqueues individual email/push notifications in batches. Email delivery uses SES with per-tenant sending domains. Push notifications use FCM/APNs via a mobile push gateway. Users control notification preferences (email, push, in-app) which are respected during fan-out.

Database Design

PostgreSQL with schema-per-tenant. Within each tenant schema, core tables: users (user_id, email, role, sso_external_id), courses (course_id, term_id, title, status), enrollments (enrollment_id, user_id, course_id, role, section_id), assignments (assignment_id, course_id, title, due_at, points_possible, submission_type), submissions (submission_id, assignment_id, user_id, submitted_at, file_urls[], body_text), grades (grade_id, submission_id, score, grader_id, graded_at, rubric_data_json). A global tenants table in the public schema maps tenant slugs to schema names and configuration. PgBouncer pools connections per schema. Audit logs (FERPA compliance) are written to an immutable append-only table with row-level security preventing deletes.

API Design

GET /courses/{course_id}/gradebook — returns full grade matrix for the course (all students × all assignments); served from materialized view, cached in Redis per instructor session
POST /assignments/{assignment_id}/submissions — multipart upload for file submissions; stores metadata in PostgreSQL, file in S3, triggers plagiarism check async
PUT /grades/{grade_id} — body: {score, rubric_data, comment}, updates grade, invalidates gradebook cache, records audit log entry
POST /courses/{course_id}/announcements — body: {title, body, notify_by: ["email","push"]}, publishes announcement, triggers async fan-out

Scaling & Bottlenecks

The schema-per-tenant model works up to ~10k tenants on a single PostgreSQL cluster because PostgreSQL's planner handles schema routing efficiently with connection-level search_path setting. Beyond 10k tenants, shard tenants across multiple PostgreSQL clusters by institution size tier (small, medium, large). Connection pooling is critical: at 5M concurrent users × 1 connection per request, a naive approach would require 5M database connections. PgBouncer in transaction-pool mode reduces this to ~500 actual DB connections per shard, with requests waiting in a pool queue (median wait <1ms at normal load).

Gradebook materialized view refresh is a bottleneck for large courses: a course with 10k students and 50 assignments has 500k cells; refreshing the entire view on each grade change is too expensive. Use incremental refresh: maintain a Redis-based dirty set of (course_id) values needing refresh; a background job refreshes dirty courses in batches every 30 seconds. Instructors loading the gradebook see at-most-30-second-stale data, which is acceptable for the grading use case.

Key Trade-offs

Schema-per-tenant vs. row-level security: Schema isolation is simpler to reason about and prevents cross-tenant data leaks from query bugs, but complicates cross-tenant analytics and increases schema migration complexity; RLS is more flexible but requires rigorous policy testing.
Materialized view vs. on-demand aggregation for gradebook: Materialized views give sub-second reads but add write overhead and stale-data risk; on-demand aggregation always returns current data but is too slow for 10k-student courses.
SCORM server-side proxy vs. client-direct S3: Proxying SCORM content through the server enables access control and audit logging but adds egress cost and latency; pre-signed S3 URLs reduce server load but bypass server-side audit logging.
Multi-tenant cluster vs. dedicated clusters per enterprise: Shared clusters reduce operational overhead and cost for small institutions; large enterprise customers often require dedicated clusters for compliance (data residency, SOC 2 audit scope) and SLA guarantees.

System Design: Learning Management System (LMS)

Requirements

Scale Estimation

High-Level Architecture

Core Components

Gradebook Service

SCORM/xAPI Compliance Layer

Notification and Announcement Service

Database Design

API Design

Scaling & Bottlenecks

Key Trade-offs

Master this topic in our 12-week cohort

System Design: Multi-Tenant SaaS Architecture

System Design: Online Learning Platform (Coursera-scale)

System Design: Online Coding Judge (LeetCode-style)

System Design: Quiz & Assessment Platform

System Design: Flashcard & Spaced Repetition App

System Design: Video Lesson Platform