Time-Series Data Modeling Explained: Storage, Indexing, and Query Patterns

How to model time-series data effectively — partitioning by time, downsampling, retention policies, and choosing between TimescaleDB, InfluxDB, and Cassandra.

time-seriesdata-modelingdatabasestimescaledbmonitoring

Time-Series Data Modeling

Time-series data modeling is the practice of designing schemas, storage, and query patterns for data that is indexed primarily by time — metrics, logs, IoT sensor readings, financial ticks, and event streams.

What It Really Means

Time-series data has unique characteristics that make general-purpose databases a poor fit at scale. The data is append-heavy (you rarely update historical metrics). Queries almost always filter by time range. Old data becomes less valuable and can be downsampled or deleted. And the volume is enormous — a fleet of 10,000 servers reporting 100 metrics every 10 seconds generates 100 million data points per minute.

General-purpose databases like PostgreSQL can handle small-scale time-series workloads, but they struggle with the write throughput and query patterns at scale. Specialized time-series databases (TimescaleDB, InfluxDB, QuestDB, ClickHouse) optimize for these access patterns with columnar storage, automatic time-based partitioning, built-in downsampling, and efficient time-range queries.

Understanding time-series data modeling is essential for designing monitoring systems, IoT platforms, financial trading systems, and analytics pipelines.

How It Works in Practice

Data Characteristics

Schema Design Approaches

Wide table (one metric per column):

sql

Narrow table (EAV style):

sql

Time-Based Partitioning

sql

Time-based partitioning gives two major benefits:

  1. Fast range queries: Query for "last 24 hours" only scans 1-2 partitions instead of the entire table
  2. Efficient data retention: Drop old partitions instantly instead of DELETE (which is slow and creates bloat)

Implementation

TimescaleDB setup and queries:

sql

Downsampling strategy:

Trade-offs

Time-series database comparison:

AspectTimescaleDBInfluxDBClickHouseCassandra
SQL supportFull PostgreSQLInfluxQL/FluxSQL-likeCQL
Write throughputHighVery highVery highVery high
Aggregation speedGoodGoodExcellentModerate
EcosystemPostgreSQL toolsPurpose-builtOLAP toolsDistributed
ComplexityLow (PostgreSQL)MediumMediumHigh

When to use a time-series database:

  • Ingesting more than 100K data points per second
  • Queries are primarily time-range based
  • You need built-in downsampling and retention
  • Data is append-only or append-heavy

When PostgreSQL is sufficient:

  • Fewer than 10K data points per second
  • Time-series is a small part of the application (mixed workload)
  • You need ACID transactions on time-series data
  • Team expertise is in PostgreSQL

Common Misconceptions

  • "You always need a specialized time-series database" — PostgreSQL with proper partitioning handles small to medium time-series workloads well. TimescaleDB adds time-series features while keeping full PostgreSQL compatibility.
  • "Schemaless is better for time-series" — Typed schemas (wide tables) are more efficient for storage and queries. Use schemaless (narrow tables) only when metric names change frequently.
  • "Store everything at maximum resolution forever" — This is economically impractical. Downsample older data to reduce storage costs by 99%+ with minimal information loss.
  • "Time-series databases replace relational databases" — They complement each other. Store metadata (sensor locations, thresholds) in PostgreSQL and time-series data in a specialized store.
  • "Timestamps are sufficient as a primary key" — Multiple sources can report at the same timestamp. Use (timestamp, source_id) as a composite key.

How This Appears in Interviews

  1. "Design a monitoring system for 10,000 servers" — Time-series database with time-based partitioning, downsampling for retention, and pre-aggregated dashboards.
  2. "How do you handle millions of IoT sensor readings per second?" — LSM-tree based time-series database, partition by time, batch inserts, out-of-order handling.
  3. "Your metrics dashboard is slow. How do you speed it up?" — Pre-aggregate data at query-time bucket sizes using materialized views or continuous aggregates.
  4. "How do you manage storage costs for time-series data?" — Tiered retention: raw data for days, downsampled for months, aggregated for years.

Related Concepts

GO DEEPER

Learn from senior engineers in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.