SYSTEM_DESIGN
System Design: Single Sign-On (SSO) System
Design an enterprise-grade Single Sign-On system that authenticates users once and provides seamless access to multiple applications using SAML 2.0 and OpenID Connect. Covers session management, IdP federation, MFA integration, and session revocation.
Requirements
Functional Requirements:
- Authenticate users once and issue session tokens reusable across all registered applications
- Support SAML 2.0 (SP-initiated and IdP-initiated flows) and OIDC for application integration
- Federate with external identity providers: LDAP/Active Directory, Google Workspace, Okta
- Enforce multi-factor authentication (TOTP, WebAuthn/FIDO2, SMS) with per-application MFA policies
- Provide centralized session management: view active sessions, revoke specific sessions
- Support Just-In-Time (JIT) user provisioning: auto-create user accounts on first SSO login
Non-Functional Requirements:
- Authentication latency under 500ms for password + MFA combined
- Support 1 million active users with 100,000 peak concurrent sessions
- 99.99% availability; SSO downtime prevents access to all connected applications
- Session tokens encrypted with AES-256-GCM; transmitted only over TLS 1.3
- Compliance with SOC 2 Type II; audit log of all authentication events retained for 7 years
Scale Estimation
1 million users with average 2 logins/day = 2 million authentication events/day = 23 authentications/second average, 200/second peak. Active SSO sessions at any time: 100,000 (users stay logged in during business hours). Each session record: 500 bytes → 50 MB in Redis. SAML assertions: 100 applications * 50 assertion validations/second = 5,000 SAML validations/second, each requiring XML signature verification (~2ms on CPU).*
High-Level Architecture
The SSO system acts as the Identity Provider (IdP) for connected applications (Service Providers). The core flow: user navigates to an SP → SP redirects to SSO IdP with a SAML AuthnRequest or OIDC authorization request → IdP authenticates the user (credential check + MFA) → IdP issues a signed SAML assertion or OIDC ID token → SP validates the assertion and creates a local session. Subsequent requests to the same or other SPs skip re-authentication: the user's SSO session cookie is presented to the IdP, which issues a new SP-specific assertion without prompting for credentials.
The SSO session layer maintains a central session store (Redis Cluster): key sso_session:{session_id} → {user_id, auth_time, mfa_methods_used, ip, user_agent, expiry, amr_claims}. The session_id is a 32-byte cryptographically random value stored in an HttpOnly, Secure, SameSite=Lax cookie on the IdP domain. When an SP presents a session cookie, the IdP looks up the session in Redis, checks expiry and MFA policy compliance, and issues a fresh SP assertion without prompting the user.
MFA policy enforcement is application-level: each SP registers its required assurance level (AAL1: password only, AAL2: password + TOTP, AAL3: password + FIDO2 hardware key). If the current SSO session was established at a lower AAL than the SP requires, the IdP step-up challenges the user for the additional factor without requiring full re-authentication. The step-up result upgrades the session's amr (Authentication Methods References) claims for the duration of the session.
Core Components
Authentication Service
The authentication service handles credential verification. Password authentication: hash comparison using Argon2id (memory=65536 KB, iterations=3, parallelism=4) against a PostgreSQL-stored hash. LDAP/AD federation: bind to the AD domain controller with the user's credentials via LDAP over TLS; credential check is delegated to AD. TOTP verification: HMAC-SHA1 over (secret, time_step) using RFC 6238; allow ±1 time step for clock skew; enforce single-use by storing used TOTP codes in Redis with 90-second TTL. WebAuthn: verify Authenticator Data signature against the stored public key credential using the WebAuthn library.
SAML Assertion Generation
SAML assertions are XML documents signed with the IdP's private key (RSA-2048, SHA-256 digest). The assertion includes: Issuer (IdP entity ID), Subject (user's NameID, typically email), Conditions (NotBefore, NotOnOrAfter with 5-minute validity window), AuthnContext (method used), and AttributeStatement (user attributes: roles, department, email). Assertions are signed at the response level; the SP verifies the signature against the IdP's published X.509 certificate at the SAML metadata endpoint. Replay attacks are prevented by the SP tracking used assertion IDs (stored in a short-lived set in Redis).
Session Revocation
Centralized session revocation: an admin or user revokes a session by deleting its Redis entry. The next time an SP presents the session cookie to the IdP, the lookup returns no session, triggering re-authentication. For SPs with their own long-lived local sessions (not checking the IdP on every request), revocation propagation requires either: (1) SP polls the IdP's session status endpoint, or (2) the IdP broadcasts a SAML Single Logout (SLO) request to all SPs that issued assertions in the SSO session, triggering SP-side session termination.
Database Design
PostgreSQL: users (user_id UUID, email, password_hash, mfa_enabled, totp_secret_encrypted, webauthn_credentials JSONB, created_at, last_login_at, status), service_providers (sp_id, entity_id, acs_url, metadata_url, aal_required, attribute_mapping JSONB, created_at), audit_log (event_id, user_id, sp_id, event_type ENUM, ip_address, user_agent, success BOOL, failure_reason, created_at). Redis: SSO sessions, TOTP used-code sets, rate-limit counters per IP/user.
API Design
GET /saml/idp/sso — Receive SP-initiated SAML AuthnRequest; validate and redirect to login if no active session.
POST /saml/idp/sso — Receive IdP-initiated login; issue SAML assertion to the specified SP.
GET /.well-known/openid-configuration — OIDC discovery document with endpoint URLs and supported features.
POST /sessions/{session_id}/revoke — Revoke a specific SSO session; triggers SLO for all active SP sessions.
Scaling & Bottlenecks
SAML XML signature verification is CPU-intensive: 2ms per verification * 5,000 validations/second = 10 CPU-seconds/second, requiring 10 dedicated CPU cores for SAML processing. Caching the parsed and validated SP metadata (public key certificates) in memory eliminates repeated XML parsing. For high-AAL WebAuthn verification, the verification operation (ECDSA P-256 signature check) is fast (<0.5ms) and CPU-efficient.*
Redis is the single point of failure for session state. Redis Cluster with 3 primary + 3 replica nodes across 3 AZs provides HA. Redis Sentinel handles automatic failover within 30 seconds. For disaster recovery, session data is periodically snapshotted to S3; in a complete Redis failure, users are required to re-authenticate (acceptable since SSO downtime is already a catastrophic event).
Key Trade-offs
- Centralized session store vs. stateless tokens: Centralized Redis enables instant session revocation; stateless JWT-based sessions eliminate the Redis dependency but make revocation impractical before token expiry.
- SAML vs. OIDC: SAML is XML-based and widely supported by enterprise applications (legacy SaaS); OIDC is JSON/REST-based and better suited for modern web and mobile apps. Supporting both maximizes SP compatibility.
- Strict vs. relaxed clock skew tolerance: SAML assertions have a NotBefore/NotOnOrAfter window; too-strict (0 skew) rejects legitimate requests from servers with minor clock drift; too-relaxed (>5 minutes) allows replay attacks.
- JIT provisioning vs. pre-provisioned users: JIT provisioning simplifies user lifecycle management but creates accounts on first login without admin review; pre-provisioning (via SCIM) gives IT full control but requires synchronization between the directory and each SP.
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.