SYSTEM_DESIGN

System Design: Telemedicine Platform

Deep dive into designing a HIPAA-compliant telemedicine platform supporting real-time video consultations, secure messaging, e-prescribing, and multi-provider scheduling for virtual healthcare delivery at scale.

17 min readUpdated Jan 15, 2025
system-designtelemedicinehealthcarehipaawebrtc

Requirements

Functional Requirements:

  • Real-time video and audio consultations between patients and healthcare providers with screen sharing and live annotation
  • Secure HIPAA-compliant messaging with file attachments (images, documents, lab results)
  • Virtual waiting room with queue management and estimated wait time display
  • E-prescribing integration sending prescriptions electronically to pharmacies via NCPDP SCRIPT standard
  • Provider availability calendar with real-time slot management and multi-timezone support
  • Post-visit clinical note generation with ICD-10/CPT coding for billing

Non-Functional Requirements:

  • Support 100,000 concurrent video sessions with p99 video latency under 200ms
  • End-to-end encryption for all video, audio, and messaging content (SRTP for media, TLS 1.3 for signaling)
  • HIPAA compliance: signed BAA with all cloud providers, encryption at rest and in transit, access controls
  • 99.95% availability for the video path — dropped calls during consultations are clinically unacceptable
  • Graceful degradation: fall back to audio-only if bandwidth drops below 500kbps

Scale Estimation

With 100,000 concurrent video sessions, each session consumes approximately 2.5 Mbps bidirectional (720p video + audio). Total bandwidth: 250 Tbps aggregate across all media servers. However, with geographic distribution across 20 regions, each region handles ~5,000 sessions = 12.5 Tbps per region. Media server capacity: each server handles 500 simultaneous sessions using hardware-accelerated encoding, requiring 10 servers per region (200 servers globally). Signaling server handles 100K concurrent WebSocket connections — at 1 signaling message/sec per session, that is 100K messages/sec. Messaging service handles 5M messages/day with an average payload of 2KB = 10GB/day. Visit records: 500K telehealth visits/day generating clinical documentation.

High-Level Architecture

The telemedicine platform is built on a WebRTC-based architecture with a Selective Forwarding Unit (SFU) model for media routing. The client applications (iOS, Android, Web) establish WebRTC peer connections through the Signaling Service (WebSocket-based, running on Node.js). The Signaling Service coordinates session establishment using the ICE framework with STUN/TURN servers for NAT traversal. Media streams are routed through the SFU (built on mediasoup or Janus) which receives a single stream from each participant and selectively forwards it to others — this is more bandwidth-efficient than a full mesh for multi-party consultations.

The application layer consists of domain services: Visit Service (manages the lifecycle of a telehealth encounter from scheduling through post-visit documentation), Messaging Service (secure asynchronous communication using end-to-end encryption with the Signal Protocol adapted for healthcare), Prescription Service (generates NCPDP SCRIPT messages and routes them to pharmacy networks via Surescripts), and Queue Service (manages the virtual waiting room with priority-based queuing for urgent cases). All services communicate through an API Gateway (Envoy) handling OAuth 2.0 with SMART on FHIR scopes for authorization.

Infrastructure is deployed across multiple AWS regions with media servers placed in edge locations to minimize latency. All PHI (Protected Health Information) is encrypted at rest with AES-256 via AWS KMS and in transit with TLS 1.3. Video recordings (when consented) are stored in S3 with server-side encryption and a lifecycle policy moving recordings to Glacier after 90 days. A dedicated Compliance Service handles consent management, audit logging, and BAA tracking.

Core Components

Media Engine (SFU)

The Selective Forwarding Unit is the core of the video system, built on mediasoup running on dedicated bare-metal servers with hardware video encoding support. Each SFU instance handles 500 concurrent sessions by forwarding (not transcoding) media streams between participants. Simulcast is enabled: the sender encodes the video at three quality levels (180p, 360p, 720p), and the SFU selectively forwards the appropriate level based on the receiver's bandwidth and viewport size. Bandwidth estimation runs continuously via REMB (Receiver Estimated Maximum Bitrate), and the SFU dynamically switches simulcast layers when network conditions change. For graceful degradation, if bandwidth drops below 500kbps, the SFU stops forwarding video and switches to audio-only with a static provider photo. Session recording uses a headless Chrome recorder instance that joins the session as a silent participant and writes the composite stream to S3 via a GStreamer pipeline.

Virtual Waiting Room & Queue Service

The Queue Service manages patient flow through the virtual clinic. When a patient joins for their appointment, they enter a virtual waiting room implemented as a priority queue in Redis Sorted Sets. Priority factors include: appointment time, provider assignment, urgency level (flagged by triage), and wait time. Providers see their queue ordered by priority and can admit the next patient with one click, which triggers the Signaling Service to establish the WebRTC session. Real-time wait time estimation uses an exponential moving average of recent visit durations per provider. The queue publishes position updates via Server-Sent Events (SSE) to the patient's browser — the patient sees their position and estimated wait time updating live. If a provider is running late, the system automatically sends SMS notifications to waiting patients via Twilio.

Secure Messaging Service

The Messaging Service provides HIPAA-compliant asynchronous communication between patients and care teams. Messages are encrypted end-to-end using a healthcare-adapted implementation of the Signal Protocol (Double Ratchet with AES-256-GCM). Encryption keys are managed per conversation thread with key rotation on every message. Message metadata (sender, recipient, timestamp) is stored in PostgreSQL, while encrypted message bodies are stored in S3. File attachments (lab results, wound photos) are encrypted client-side before upload with a per-file AES key that is then encrypted with the conversation key. The service supports group threads for care team communication (primary physician, specialist, nurse, patient) with per-participant key distribution. Message delivery status (sent, delivered, read) is tracked and surfaced to the sender.

Database Design

The Visit Service uses PostgreSQL with tables: visits (visit_id, patient_id, provider_id, scheduled_time, actual_start, actual_end, status, visit_type, chief_complaint, clinical_notes_id, recording_url_encrypted, billing_codes JSONB), providers (provider_id, npi, specialty, facility_id, license_states, available_hours JSONB), and visit_participants (visit_id, user_id, role, join_time, leave_time, connection_quality_metrics JSONB). The messaging schema: conversations (conversation_id, type DIRECT/GROUP, created_at), conversation_participants (conversation_id, user_id, public_key, joined_at), messages (message_id, conversation_id, sender_id, encrypted_body_s3_key, content_type, sent_at, delivered_at, read_at).

Redis stores ephemeral state: active sessions (session_id → participant list, SFU server assignment), waiting room queues (sorted sets per provider), and WebSocket connection mappings (user_id → WebSocket server for signaling routing). Session state has a 4-hour TTL.

API Design

  • POST /v1/visits — Schedule a telehealth visit; body contains patient_id, provider_id, scheduled_time, visit_type (VIDEO/AUDIO/CHAT), chief_complaint; returns visit_id and join_url
  • POST /v1/visits/{visit_id}/join — Patient or provider joins the visit; triggers WebRTC session setup; returns signaling WebSocket URL and ICE server credentials
  • POST /v1/messages — Send a secure message; body contains conversation_id, encrypted_body, content_type, attachment_keys; returns message_id and sent_at timestamp
  • POST /v1/prescriptions — Submit an e-prescription; body contains patient_id, medication (RxNorm code), dosage, quantity, pharmacy_ncpdp_id; routes through Surescripts network

Scaling & Bottlenecks

The media servers are the primary scalability challenge. Each SFU server is stateful (holds active WebRTC sessions), so horizontal scaling requires session-aware routing. New sessions are assigned to the least-loaded SFU in the nearest geographic region via a Session Router service that maintains real-time load metrics (CPU, bandwidth, session count) from each SFU in a Redis cluster. Mid-session failover is handled by the client reconnecting to a backup SFU — the Signaling Service maintains a secondary SFU assignment for each session. TURN server bandwidth is the cost bottleneck: approximately 15% of connections require TURN relay due to symmetric NATs, consuming 37.5 Tbps of relay bandwidth. This is mitigated by deploying TURN servers in edge locations using coturn with bandwidth quotas per session.

The Signaling Service must handle 100K concurrent WebSocket connections. Each Node.js instance handles 10K connections using the ws library. Horizontal scaling uses 10+ instances with sticky sessions (based on user_id hash) via the load balancer. Cross-instance signaling (when two participants of the same session are connected to different signaling servers) uses Redis Pub/Sub with channel-per-session for message routing.

Key Trade-offs

  • SFU over MCU (Multipoint Control Unit): SFU forwards streams without transcoding, reducing server CPU by 10x and latency by 50ms, but shifts bandwidth and decoding burden to clients — acceptable for 1-on-1 consultations but may struggle for large group sessions
  • End-to-end encryption over server-side encryption only: E2E encryption means the platform operator cannot access message content (stronger HIPAA posture), but prevents server-side features like content search and spam filtering — mitigated by client-side search on decrypted local cache
  • Simulcast over SVC (Scalable Video Coding): Simulcast is simpler to implement and widely supported by browsers, but sends 3x the upload bandwidth from the sender — acceptable given that providers typically have high-bandwidth clinical workstations
  • Redis for ephemeral session state over persistent database: Redis provides sub-millisecond latency for session lookups critical in the WebRTC signaling path, but session data is lost on Redis failure — mitigated by Redis Cluster with replication and client-side reconnection logic

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.