System Design: Fitness Tracking App (Fitbit-scale)

Requirements

Functional Requirements:

Continuously sync biometric data from wearable devices (heart rate, steps, sleep, SpO2) via Bluetooth and cloud sync
Display activity summaries: daily step count, calories burned, active minutes, sleep stages
Goal setting and progress tracking with streaks and achievement badges
Historical trend analysis: weekly, monthly, and yearly views with charts
Heart rate zone training and workout detection with auto-exercise recognition
Data sharing with third-party health apps (Apple Health, Google Fit) via authorized OAuth

Non-Functional Requirements:

Ingest 100M data points/minute from 20M active devices
Activity summary available within 30 seconds of data sync
Historical queries for 1 year of per-minute data must return in under 2 seconds
99.9% uptime; users rely on daily health data
HIPAA-adjacent data handling: explicit consent for health data sharing; encryption at rest

Scale Estimation

At Fitbit scale: 20M active devices each syncing every 15 minutes = ~22k syncs/second. Each sync batch: ~900 data points (60 per minute × 15 min) × ~20 bytes each = 18KB/sync. Total ingestion: 22k × 18KB = ~396MB/second. Storage: 20M devices × 1M data points/device/year = 20 trillion data points/year. At 20 bytes each = 400TB/year of raw data. Aggregated summaries (daily, hourly) dramatically reduce query data volume.

High-Level Architecture

The platform is organized around an Ingestion Layer, a Stream Processing Layer, and a Query/Serving Layer. The Ingestion Layer receives device sync payloads via HTTPS (mobile app) or direct device cloud sync, validates, and publishes to Kafka. The Stream Processing Layer (Apache Flink) consumes the Kafka stream to compute real-time activity metrics: running step totals, active minute detection, heart rate zone classification, and sleep stage inference. Results are written to both a time-series store (raw data) and a pre-aggregated summaries store (daily/hourly rollups).

Device sync is designed to be idempotent. Each sync payload includes device serial number, sync timestamp, and sequence numbers for each data type. The ingestion service deduplicates based on (device_id, data_type, timestamp) using a Bloom filter backed by Redis for fast duplicate detection before database writes.

The Query/Serving Layer provides read APIs backed by pre-aggregated summaries for common dashboard views (today's stats, this week) and the raw time-series store for detailed historical analysis. The pre-aggregation dramatically reduces query cost — a user's weekly step chart fetches 7 daily summary records rather than 10,080 per-minute records.

Core Components

Device Sync & Ingestion Service

Accepts compressed (gzip) batch sync payloads from mobile apps. Validates device ownership and payload format. Deduplicates using a Redis Bloom filter keyed by hash(device_id + data_type + timestamp). Valid data points are published to partitioned Kafka topics (partitioned by user_id for processing locality). The service returns a sync acknowledgment with the server-side received timestamp, used by the device to advance its sync watermark. Devices that fail to receive an ACK resend the batch on next sync — idempotent handling ensures no duplicates.

Stream Processing (Flink)

Consumes Kafka streams and maintains per-user state (running daily totals, workout session detection) in Flink's managed state backend (RocksDB). Activity detection uses a sliding-window algorithm: if heart rate > threshold for 10 minutes and step cadence > 80 steps/minute, classify as a workout. Sleep stage inference uses heart rate variability and accelerometer patterns with a rule-based classifier. Computed metrics are written to: (1) InfluxDB for raw time-series, (2) PostgreSQL for daily summary records, (3) Redis for live dashboard values (today's stats).

Health Analytics & Trend Service

Pre-computes weekly and monthly aggregates nightly via a scheduled batch job (Spark). Aggregates include: average daily steps, resting heart rate trend, sleep score trend, active days streak, and cardio fitness score (VO2 max estimate). Results are stored as summary records in PostgreSQL and exposed via a dedicated Analytics API. Percentile rankings ("You're in the top 25% for steps among users your age") are computed from population-level anonymized aggregates, stored in Redis sorted sets, and refreshed weekly.

Database Design

Raw time-series data in InfluxDB: measurement biometrics, tags {user_id, device_id, data_type}, fields {value}, timestamp in nanoseconds. Retention policy: 90 days of raw per-minute data; 1 year of per-hour aggregates; indefinite for daily aggregates. InfluxDB continuous queries perform the per-hour and per-day aggregations automatically.

PostgreSQL stores: daily_summaries (user_id, date, steps, calories, active_minutes, sleep_minutes, avg_heart_rate, sleep_score). workouts (workout_id, user_id, started_at, ended_at, type, calories, avg_hr, steps, map_data_s3_key). goals (goal_id, user_id, metric, target_value, period, start_date, end_date). achievements (user_id, achievement_type, earned_at). Users and devices in PostgreSQL with full relational integrity.

API Design

POST /api/v1/devices/{deviceId}/sync — accepts compressed batch payload; idempotent; returns {ack_timestamp, next_sync_after}.

GET /api/v1/users/{userId}/summary/today — returns live daily dashboard stats from Redis cache.

GET /api/v1/users/{userId}/trends?metric=steps&period=last_30_days&granularity=daily — returns trend data for charts.

GET /api/v1/users/{userId}/workouts?from={date}&to={date} — returns workout history with optional GPS track download link.

Scaling & Bottlenecks

The ingestion rate of 22k syncs/second is manageable with a horizontally scaled ingestion fleet behind an NLB. Kafka is the critical buffer — with 20 partitions per topic, each partition receives ~1,100 messages/second, well within Kafka's per-partition limits. The Flink stream processor is the statefulness bottleneck — user state (daily running totals) must be recovered on job restart from the RocksDB checkpoint. Incremental checkpointing every 30 seconds minimizes recovery time after failures.

InfluxDB write throughput can become a bottleneck at 100M data points/minute. InfluxDB handles this via line protocol batch writes; the Flink output sink writes to InfluxDB in micro-batches of 10k points, reducing TCP round-trip overhead. InfluxDB is deployed as a cluster with shard groups distributed by time (each shard group covers a 1-week window) and series (partitioned by user_id hash) to distribute both write and read load.

Key Trade-offs

Raw storage vs. aggressive aggregation: Storing raw per-second data enables retrospective analysis with new algorithms but at enormous cost (4x+ storage); per-minute raw data with per-hour aggregates after 90 days balances analytical flexibility and storage cost.
On-device processing vs. cloud processing: Processing on the wearable (steps, sleep stages) reduces cloud bandwidth but is limited by device CPU; cloud-side processing can use more sophisticated models and be updated without firmware changes.
Flink managed state vs. external state store: Flink's managed state (RocksDB) is fast and recoverable but ties state to the job topology, making re-scaling complex; an external state store (Redis) is more flexible but adds network latency to every state read/write.
InfluxDB vs. TimescaleDB: InfluxDB is purpose-built for time-series with automatic aggregation and retention but lacks relational joins; TimescaleDB (PostgreSQL extension) enables complex relational queries but is slower for pure time-series write throughput.