SYSTEM_DESIGN
System Design: Sensor Data Processing Pipeline
Design a high-throughput sensor data processing pipeline that ingests time-series data from distributed sensors, applies real-time transformations and anomaly detection, and serves aggregated data to dashboards and alerting systems.
Requirements
Functional Requirements:
- Ingest raw sensor readings from distributed sensors at high frequency (1-100 Hz per sensor)
- Apply real-time transformations: unit conversion, calibration corrections, and rolling window aggregations
- Detect anomalies: threshold breaches, statistical outliers, and pattern deviations in real time
- Write processed data to a time-series database for querying and visualization
- Generate alerts when sensor values cross configurable thresholds
- Support sensor metadata management: sensor registry, calibration parameters, alert thresholds
Non-Functional Requirements:
- Ingest 50 million sensor readings per second at peak
- End-to-end processing latency (sensor reading to alert) under 1 second
- Zero data loss for critical sensors; best-effort for low-priority sensors
- Time-series data retained for 5 years with downsampling for older data
- Support 1 million distinct sensors with independent calibration and alert configurations
Scale Estimation
1M sensors × 50 readings/second average = 50M readings/second. At 100 bytes per reading (sensor_id, value, timestamp, quality_flag), that's 5 GB/second of ingestion. Kafka at 5 GB/second requires ~50 brokers (100 MB/second per broker with 3× replication factor). Processing latency budget: ingest → Kafka (100ms) → stream processor (200ms) → anomaly check (100ms) → DB write (300ms) → alert (200ms) = 900ms total — fits within the 1-second SLA. Time-series storage: 50M readings/second × 86,400 seconds/day = 4.3 trillion readings/day. Compressed at 10 bytes/reading (Gorilla compression for time-series), that's 43 TB/day. Over 5 years = 78 PB; downsampling older data (5-minute averages after 90 days) reduces this to ~5 PB.
High-Level Architecture
The pipeline uses a Lambda-like architecture with a speed layer (stream processing) and a batch layer (historical aggregation and model training). The speed layer handles real-time ingestion, transformation, anomaly detection, and alerting. The batch layer handles historical aggregation for storage efficiency, ML model retraining for adaptive anomaly detection, and compliance reporting.
Ingestion: sensors publish readings to the ingestion gateway (MQTT or HTTP) which writes to Kafka after schema validation and enrichment (appending sensor metadata like location and calibration coefficients from a Redis-cached sensor registry). Kafka topics are partitioned by sensor_id hash, ensuring all readings from one sensor go to the same partition and arrive in order.
Stream processing (Apache Flink): operators consume from Kafka, apply calibration corrections (multiply raw ADC value by calibration coefficient from sensor registry), compute rolling window aggregations (60-second average, min, max, std_dev using Flink's sliding window API), run anomaly detection (threshold check + 3σ statistical outlier detection against the 60-second baseline), and write processed readings to the time-series database. Anomaly events are published to an alert Kafka topic consumed by the alerting service.
Core Components
Ingestion Gateway
The gateway accepts sensor data via MQTT (persistent, low-overhead protocol for constrained devices) and HTTP (for cloud-native sensors). Validation: each incoming message is validated against the sensor's registered schema (from the sensor registry cache) — data type, value range plausibility, and timestamp recency (reject messages older than 5 minutes to prevent replay attacks). Enrichment: the gateway adds calibration coefficients and quality flags (from sensor registry Redis cache) to each reading before publishing to Kafka. The gateway is stateless and horizontally scaled; authentication uses pre-shared API keys per sensor (stored in Redis for sub-millisecond lookup).
Stream Processing Engine
Flink jobs consume from Kafka with operator parallelism matching Kafka partition count (1,000 partitions = 1,000 parallel operators). The transformation operator applies: unit conversion (e.g., raw voltage to Celsius using polynomial calibration), timestamp normalization (convert device local time to UTC using per-sensor timezone), and data quality scoring (flag readings with noise above calibration tolerance). The aggregation operator maintains keyed state (keyed by sensor_id) with two tumbling windows: 1-minute and 1-hour. Per window, it computes (count, sum, min, max, sum_of_squares) — from these, mean and variance are derived without storing all values. Aggregated summaries are written to TimescaleDB as continuous aggregates.
Anomaly Detection Engine
Anomaly detection runs as a Flink operator after transformation. Two detection methods: (1) static threshold — each sensor has configured high/low bounds stored in the sensor registry; a reading outside bounds triggers an immediate alert; (2) dynamic anomaly detection — using a streaming z-score: maintain a running mean and standard deviation using Welford's online algorithm; a reading more than 3σ from the rolling mean is flagged as an anomaly. Dynamic detection adapts to sensors with diurnal patterns (e.g., temperature sensors) — the baseline updates as the reading series evolves. Anomaly events are de-duplicated (same sensor, same type, within 5-minute window = one alert, not thousands) using a Redis-based deduplication set.
Database Design
Kafka: sensor-raw (ingested readings, Protobuf-serialized, 7-day retention), sensor-anomalies (anomaly events, 30-day retention). TimescaleDB: sensor_readings (sensor_id, value, unit, quality_score, recorded_at) — hypertable with 7-day chunks, compressed after 7 days using the Gorilla algorithm (time-series-aware delta-of-delta encoding); sensor_aggregates_1m (sensor_id, avg, min, max, stddev, count, bucket — 1-minute continuous aggregate); sensor_aggregates_1h (1-hour aggregate). PostgreSQL: sensors (sensor_id, name, location_json, unit, calibration_coefficients_json, alert_config_json, owner_id, status). Redis: sensor:{sensor_id}:meta (calibration, alert thresholds — TTL 1h), anomaly_dedup:{sensor_id}:{type} (deduplication set, TTL 5m). S3: long-term archive (Parquet, 90+ days, partitioned by sensor_id/year/month).
API Design
POST /ingest/{sensor_id}— body:{value, timestamp, unit}or batch array, validates, publishes to Kafka; rate-limited per sensor_idGET /sensors/{sensor_id}/readings?from={ts}&to={ts}&resolution={1m|1h|1d}— returns time-series data from TimescaleDB; resolution maps to continuous aggregate tableGET /sensors/{sensor_id}/anomalies?from={ts}&to={ts}— returns anomaly events with context (value at time, threshold, 3σ bound)PUT /sensors/{sensor_id}/config— body:{alert_thresholds, calibration_coefficients}, updates sensor registry in PostgreSQL and invalidates Redis cachePOST /alerts/subscribe— body:{sensor_ids[], webhook_url, alert_types[]}, registers webhook for alert delivery
Scaling & Bottlenecks
Kafka ingestion at 5 GB/second is the primary throughput challenge. Kafka's write throughput is primarily limited by disk I/O (WAL writes) and network. Using NVMe SSDs and Kafka's log compaction disabled for high-throughput topics, a single broker handles ~500 MB/second write. 50 brokers handle the load with 3× replication. Flink's state store (RocksDB) for 1M keyed sensor states in the aggregation operator requires ~1 GB RAM per task manager at 1 KB/sensor state. With 1,000 Flink task slots across 100 task managers, each task manager holds ~10k sensor states = 10 MB RAM — trivial.
TimescaleDB write throughput: 50M rows/second cannot be written row-by-row. Flink's TimescaleDB sink batches 10,000 rows per INSERT, reducing write calls to 5,000/second. TimescaleDB with 10 nodes, each receiving 500 COPY calls/second, handles this workload. The continuous aggregate refresh (1-minute aggregates) runs every 30 seconds via a scheduled background worker in TimescaleDB — decoupled from the insert path.
Key Trade-offs
- Static vs. dynamic anomaly detection: Static thresholds are simple to configure and explain to operators but miss context-dependent anomalies (e.g., a temperature of 50°C is normal in a furnace room but anomalous in an office); dynamic 3σ detection adapts but takes time to calibrate for new sensors and generates false positives during sudden legitimate regime changes.
- Protobuf vs. JSON for sensor payloads: Protobuf reduces payload size by 5-10× compared to JSON (100 bytes vs. 500 bytes per reading) and is significantly faster to serialize/deserialize, but requires schema management (registry) and client library support; JSON is universally supported but wastes bandwidth at 50M messages/second.
- TimescaleDB vs. InfluxDB vs. Apache Druid: TimescaleDB offers SQL and PostgreSQL compatibility; InfluxDB excels at single-metric time series with a purpose-built storage engine; Druid provides the best query performance for large-scale analytics aggregations. For a mixed workload (operational queries + analytics), TimescaleDB is the best general choice.
- In-stream vs. out-of-stream anomaly detection: Running anomaly detection in the Flink stream provides sub-second latency but limits model complexity; offloading to a separate ML inference service (calling a REST API per reading) adds ~50ms latency but enables complex models. For 50M readings/second, the REST API approach would require 50M API calls/second — impractical; in-stream Flink operators are the only viable option at this scale.
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.
// RELATED_DESIGNS
System Design: IoT Data Ingestion Platform
Design a scalable IoT data ingestion platform capable of receiving telemetry from millions of connected devices via MQTT and HTTP, with device authentication, data validation, and routing to downstream storage and processing systems.
System Design: IoT Analytics Platform
Design a scalable IoT analytics platform that processes telemetry from millions of devices, generates operational insights, supports ad-hoc queries over historical data, and powers real-time dashboards for device fleet operators.
System Design: Streaming Analytics (Viewer Metrics)
System design of a streaming analytics platform for viewer metrics covering real-time event ingestion, stream processing, time-series storage, and dashboard visualization for video platform analytics.
System Design: Real-Time GPS Tracking
Learn how to build a real-time GPS tracking system for fleets, deliveries, or assets — covering high-frequency location ingestion, live map updates, geofencing, and time-series storage.
System Design: Fitness Tracking App (Fitbit-scale)
Design a Fitbit-scale fitness tracking application ingesting continuous biometric data from wearable devices, supporting activity logging, health analytics, and personalized goal tracking for millions of users.
System Design: Time Series Database
Design a time series database optimized for high-throughput metric ingestion and fast range queries, supporting billions of data points from IoT devices, application monitoring, and financial tick data.