System Design: Workplace Collaboration Tool

Requirements

Functional Requirements:

Real-time messaging in channels (public, private) and direct messages
File uploads with inline previews (images, PDFs, code snippets)
Presence indicators (online, away, do not disturb, offline)
Message threading, reactions, and editing/deletion with history
Notifications (push, email digest) with user-configurable preferences
Search across messages, files, and people within a workspace

Non-Functional Requirements:

Message delivery latency under 100ms for 99th percentile
Support 10M concurrent connections across the platform
Messages must be durably persisted; no message loss on delivery
Notification fan-out to 100k+ channel members must complete within 5 seconds
End-to-end encryption option for direct messages

Scale Estimation

At Slack scale: 20M daily active users, average 200 messages/user/day = 4B messages/day or ~46,000 messages/second peak. Each message fans out to an average of 50 recipients = 2.3M delivery events/second. File storage: 50M uploads/day at average 500KB = 25TB/day. Presence updates: each user sends a heartbeat every 30 seconds = 660k heartbeats/second.

High-Level Architecture

The platform is composed of a WebSocket Gateway tier, a Message Service, a Presence Service, a Notification Service, and a Search Service. WebSocket gateways are stateful: each server maintains open connections and routes incoming messages to the appropriate services via an internal message bus (Kafka). Connection state is tracked in a distributed registry (Redis) so any gateway can look up which server hosts a given user's connection.

Messages flow: client sends message over WebSocket → gateway publishes to Kafka topic per channel → Message Service consumes, persists to database, computes fan-out list → Fan-out Service pushes to recipient WebSocket connections via the registry, or enqueues offline notifications. This decoupled pipeline allows each stage to scale independently.

The Presence Service is a separate concern. Clients send periodic heartbeats; the service maintains presence state in Redis with TTL-based expiry. Presence changes are published to a fan-out topic and pushed to all watchers of a given user. To avoid thundering herd when a popular user goes online, presence updates are coalesced with a short debounce before broadcasting.

Core Components

WebSocket Gateway

A horizontally scaled pool of long-lived connection servers. Each server uses an async event loop (Node.js or Go) to handle tens of thousands of concurrent connections per instance. Connections authenticate via JWT on upgrade. The gateway is thin: it receives client frames, forwards to Kafka, and delivers inbound frames from the fan-out service. Gateway state (connection map) is replicated to Redis so other services can address any connected user.

Message & Storage Service

Consumes from Kafka, validates, deduplicates (idempotency key per client), and persists messages to a Cassandra cluster partitioned by (workspace_id, channel_id, bucket_time). Cassandra's wide-row model suits append-heavy time-series chat history with fast range reads by channel. Message edits and deletions are appended as new events (event-sourced), preserving history for compliance. A separate DynamoDB table stores metadata (channel membership, message counts) for low-latency lookups.

Notification & Fan-out Service

For large channels (>10k members), a tiered fan-out strategy is used. The service reads channel membership from a pre-computed list in Redis, batches recipient sets into chunks of 1,000, and delivers in parallel worker pools. Online users receive messages directly via WebSocket push. Offline users are routed to push notification queues (APNs, FCM) or email digest queues. User notification preferences are cached in Redis and applied before any delivery attempt.

Database Design

Messages are stored in Cassandra with primary key ((workspace_id, channel_id, time_bucket), message_id) where time_bucket is a truncated timestamp (e.g., hourly). This distributes writes across nodes and enables efficient time-range scans for chat history pagination. Message content above 64KB (large pastes) is stored in S3 with only the S3 key in Cassandra.

Workspace metadata, user profiles, and channel configurations live in PostgreSQL for ACID consistency. Channel membership is stored in Redis Sorted Sets (scored by join time) for fast fan-out list generation, and in PostgreSQL as the durable source of truth. A search index in Elasticsearch ingests messages via Kafka consumer, enabling full-text search with workspace-scoped filtering.

API Design

WebSocket /ws/v1/connect — establishes persistent connection; client sends {type: "message", channel_id, content, idempotency_key} frames.

POST /api/v1/channels/{channelId}/messages — REST fallback for message send (mobile clients in poor connectivity).

GET /api/v1/channels/{channelId}/messages?before={cursor}&limit=50 — paginated message history.

POST /api/v1/files/upload — multipart upload returning a file token for embedding in messages.

Scaling & Bottlenecks

The hardest scaling challenge is fan-out to very large channels. A workspace-wide announcement channel with 50k members generates 50k delivery operations per message. This is handled by pre-materializing channel membership lists in Redis and using parallel fan-out workers. Channels above a configurable threshold switch to a pub/sub broadcast model (Redis Pub/Sub or Kafka consumer groups) where each gateway subscribes to channels it has active members for, eliminating per-user delivery loops.

Presence at scale is the second bottleneck. Naively broadcasting every presence change to every member of every shared channel creates O(users²) fan-out. The solution is lazy presence: clients only subscribe to presence for users currently visible in the UI. The backend tracks active subscriptions per connection and only delivers presence events to active subscribers, reducing fan-out by ~95%.

Key Trade-offs

Push vs. pull for message delivery: WebSocket push minimizes latency but requires stateful gateway management; long-polling is simpler to scale but adds 1-2s latency.
Cassandra vs. PostgreSQL for messages: Cassandra handles high write throughput and time-range scans well but lacks flexible querying; PostgreSQL with partitioning is simpler but struggles at billions of rows.
Eager vs. lazy fan-out: Eager fan-out ensures all recipients get messages instantly but is expensive for large channels; lazy fan-out (clients poll on reconnect) reduces server load but increases perceived latency.
Message ordering guarantees: Per-channel ordering is achievable with a Cassandra time-UUID key; global ordering across channels is prohibitively expensive and generally not needed.