INTERVIEW_QUESTIONS

Data Structures for System Design Interview Questions for Senior Engineers (2026)

Master interview questions on data structures used in system design — Bloom filters, skip lists, consistent hashing, LSM trees, B-trees, tries, and more for senior engineering roles.

20 min readUpdated Apr 19, 2026
interview-questionsdata-structures-system-designsenior-engineer

Why Data Structures for System Design Matters in Interviews

System design interviews at the senior level go beyond high-level architecture diagrams. Interviewers want to see that you understand the data structures that power the systems you design. Knowing when to use a Bloom filter, how a B-tree differs from an LSM tree, or why consistent hashing solves the redistribution problem separates engineers who can design systems from those who truly understand how they work.

At companies like Google, Meta, and Amazon, these data structures appear in real systems daily — Bloom filters in databases, skip lists in Redis, LSM trees in LevelDB/RocksDB, and consistent hashing in distributed caches. Interviewers test whether you can match the right data structure to the right system design constraint.

This guide covers the most important data structures for system design with interview questions and structured answer frameworks. For broader context, see our system design interview guide, distributed systems guide, and backend development crash course.


1. Explain Bloom filters and their applications in system design.

What the interviewer is really asking: Can you apply probabilistic data structures to reduce unnecessary work in large-scale systems?

Answer framework:

A Bloom filter is a space-efficient probabilistic data structure that tests set membership. It can tell you "definitely not in the set" or "probably in the set" (with a configurable false positive rate). It never produces false negatives.

How it works: a bit array of m bits, initialized to 0. K hash functions map each element to k positions in the array. To add an element, set all k positions to 1. To query, check all k positions — if all are 1, the element is probably in the set. If any is 0, the element is definitely not.

False positive rate: approximately (1 - e^(-kn/m))^k where n is the number of elements. With 10 bits per element and 7 hash functions, the false positive rate is about 1%.

System design applications:

Database query optimization: Before reading from disk, check the Bloom filter. If the key is not in the filter, skip the disk read entirely. LSM tree databases (LevelDB, RocksDB, Cassandra) use Bloom filters on each SSTable to avoid reading SSTables that don't contain the queried key.

Web crawlers: Track visited URLs. Before fetching a URL, check the Bloom filter. If it says "probably visited," skip it (accepting occasional re-crawls). Storing billions of URLs as a set would require hundreds of GB; a Bloom filter uses a few GB.

CDN cache: Check if content is cached before sending a request to the origin. Reduces origin load for cache misses.

Distributed systems deduplication: Check if an event has been processed before. False positives mean occasionally reprocessing an event (acceptable in idempotent systems); false negatives (never happens) would mean missing events.

Limitations: standard Bloom filters don't support deletion (setting a bit to 0 would affect other elements). Counting Bloom filters (use counters instead of bits) support deletion. Bloom filters can't enumerate contained elements or count them.

See our how databases work, system design interview guide, and distributed systems guide.

Follow-up questions:

  • How do you choose the optimal number of hash functions and bit array size?
  • What is a Counting Bloom filter, and when would you use it?
  • How does Cassandra use Bloom filters to optimize reads?

2. Compare B-trees and LSM trees for database storage engines.

What the interviewer is really asking: Do you understand the fundamental trade-off between read-optimized and write-optimized storage structures?

Answer framework:

B-trees (used in PostgreSQL, MySQL InnoDB, SQLite): Balanced tree where each node maps to a fixed-size page on disk. Updates modify pages in-place. Reads follow the tree from root to leaf — O(log n) page reads. Writes update the target page and may split nodes, requiring O(log n) page writes.

Strengths: excellent read performance (single lookup follows a short path), consistent read/write performance (no background compaction), efficient range scans (leaves are linked), mature and well-understood.

Weaknesses: random I/O for writes (modifying pages in different locations on disk), write amplification from page splits and WAL, pages may become partially filled (space overhead).

LSM trees (used in LevelDB, RocksDB, Cassandra, ScyllaDB): Writes go to an in-memory buffer (memtable). When the memtable fills, it's flushed to disk as a sorted immutable file (SSTable). Background compaction merges SSTables to remove duplicates and tombstones, organizing data into levels.

Strengths: excellent write throughput (sequential I/O for flushing memtables), writes never modify existing files (append-only, crash-safe), high compression ratios (SSTables are sorted and compressed), good for write-heavy workloads.

Weaknesses: read amplification (may need to check multiple SSTables — mitigated by Bloom filters), write amplification from compaction (data is written multiple times as it moves through levels), compaction can cause latency spikes (competes for I/O with reads/writes), space amplification (old versions exist until compacted).

When to choose:

  • Read-heavy, latency-sensitive workload → B-tree (PostgreSQL, MySQL)
  • Write-heavy, throughput-oriented workload → LSM tree (Cassandra, time-series databases)
  • Mixed workload with tuning flexibility → LSM tree with bloom filters and level compaction

See our PostgreSQL interview questions, how databases work, and system design interview guide.

Follow-up questions:

  • What is write amplification, and how do B-trees and LSM trees compare?
  • How does RocksDB's level compaction differ from size-tiered compaction?
  • What is the RUM conjecture (Read, Update, Memory trade-off)?

3. Explain consistent hashing and its use in distributed systems.

What the interviewer is really asking: Do you understand how distributed systems partition data across nodes and handle node additions/removals?

Answer framework:

Consistent hashing solves the problem of distributing data across a cluster where nodes can be added or removed dynamically. With simple modular hashing (hash(key) % N), adding or removing a node changes N, causing almost all keys to be remapped — catastrophic for caches.

How it works: imagine a hash ring (0 to 2^32). Each node is placed on the ring at a position determined by hashing its identifier. Each key is placed on the ring by hashing the key, then walking clockwise to find the first node — that node owns the key.

When a node is added, it takes ownership of keys in its range — only keys that fall between the new node and its predecessor are reassigned (approximately 1/N of all keys). When a node is removed, its keys move to the next node clockwise. In both cases, most keys remain on their existing nodes.

Virtual nodes: In basic consistent hashing, nodes may receive uneven key distributions. Virtual nodes solve this — each physical node is placed at multiple positions on the ring (e.g., 100-200 virtual nodes per physical node). This provides uniform distribution and allows heterogeneous nodes (a more powerful machine gets more virtual nodes).

Applications:

  • Distributed caches (Memcached, Redis Cluster): Determine which cache node stores a key. Adding a cache node redistributes only a fraction of keys, minimizing cache misses.
  • Distributed databases (DynamoDB, Cassandra): Partition data across nodes. Cassandra uses virtual nodes (vnodes) for balanced data distribution.
  • Load balancers: Route requests to servers consistently (same client always goes to the same server for session affinity).
  • CDNs: Map content to edge servers.

Alternatives: rendezvous hashing (highest random weight) provides similar properties without a ring structure. Jump consistent hashing provides perfectly uniform distribution with minimal memory.

See our consistent hashing concepts, distributed systems guide, and system design interview guide.

Follow-up questions:

  • How do virtual nodes improve load balancing in consistent hashing?
  • What happens to data availability when a node fails in a consistent hashing ring?
  • How does DynamoDB use consistent hashing for partitioning?

4. How do tries (prefix trees) power autocomplete and routing systems?

What the interviewer is really asking: Can you connect a fundamental data structure to real system design problems?

Answer framework:

A trie (pronounced "tree" or "try") is a tree-like data structure where each node represents a character (or prefix) and paths from root to nodes represent strings. Tries enable O(L) lookup time where L is the length of the string, regardless of the number of stored strings.

Variants:

  • Standard trie: Each node has up to 26 children (for lowercase English). Each edge represents a character. Nodes marked as "end" indicate complete strings.
  • Compressed trie (Patricia tree): Chains of single-child nodes are compressed into a single node with a multi-character label. Reduces memory usage significantly.
  • Radix tree: Similar to compressed trie. Each node stores a string prefix. Used in Linux kernel for IP routing tables.

System design applications:

Autocomplete/typeahead: Store all searchable terms in a trie. As the user types each character, traverse the trie to find all completions. Augment each node with popularity scores to return the top-k completions. For large-scale autocomplete (Google, Amazon), the trie is partitioned across servers by prefix range.

IP routing tables: Routers use tries (specifically Patricia trees) for longest-prefix matching. Each prefix in the routing table is stored in the trie. When a packet arrives, the router walks the trie with the destination IP to find the longest matching prefix, which determines the next hop.

Spell checking: Store a dictionary in a trie. For a misspelled word, traverse the trie with edit distance tolerance (substitution, insertion, deletion) to find similar words.

URL routing in web frameworks: Express.js, Django, and other frameworks use trie-based routers. URL patterns are stored in a trie, and incoming URLs are matched by traversing the trie. This is faster than checking each route pattern sequentially.

Memory optimization: for large tries, use compressed representations (LOUDS, succinct tries) that use near-optimal space while supporting efficient queries. FSTs (Finite State Transducers) — used in Elasticsearch's term dictionary — are a further optimization that shares suffixes as well as prefixes.

See our search engine system design, system design interview guide, and Elasticsearch interview questions.

Follow-up questions:

  • How would you implement autocomplete for a system with 10 billion search terms?
  • What is the memory difference between a standard trie and a compressed trie?
  • How does Elasticsearch's FST differ from a standard trie?

5. Explain skip lists and their use in databases and caches.

What the interviewer is really asking: Do you know alternatives to balanced trees that are simpler to implement with good concurrent properties?

Answer framework:

A skip list is a probabilistic data structure that provides O(log n) average-case search, insertion, and deletion — similar to a balanced binary tree but simpler to implement, especially for concurrent access.

Structure: a skip list is a multi-level linked list. The bottom level is a sorted linked list of all elements. Each higher level is a "highway" that skips over elements. Each element is promoted to the next level with probability p (typically 1/2). Searching starts at the top level and drops down when the next element at the current level exceeds the target.

Why skip lists over balanced trees:

  • Simpler implementation: No complex rotation logic (like AVL or Red-Black trees). Insertion and deletion are straightforward linked list operations at each level.
  • Better concurrency: Lock-free skip lists are simpler to implement than lock-free balanced trees. Each level can be locked independently. ConcurrentSkipListMap in Java provides a lock-free sorted map.
  • Range queries: The sorted linked list structure naturally supports efficient range scans (iterate the bottom level).

System design applications:

Redis sorted sets: Redis uses a skip list (combined with a hash table) for its sorted set data structure. The skip list provides O(log n) insertion and range queries, while the hash table provides O(1) member lookup.

LevelDB/RocksDB memtable: The in-memory write buffer (memtable) uses a skip list for sorted insertion and efficient scanning when flushing to an SSTable.

Lucene: Some Lucene internals use skip lists for posting list intersection (efficiently finding documents that contain multiple terms).

Trade-offs vs balanced trees: skip lists use more memory (pointers at multiple levels — approximately 2x a single linked list with p=1/2), have probabilistic rather than deterministic O(log n) guarantees, and may have worse cache locality (pointer chasing across levels). But their simplicity and concurrent-friendliness often outweigh these costs.

See our how Redis works, data structures concepts, and system design interview guide.

Follow-up questions:

  • What is the expected space overhead of a skip list compared to a sorted array?
  • How does Redis combine a skip list with a hash table for sorted sets?
  • When would you choose a skip list over a Red-Black tree?

6. How do HyperLogLog and Count-Min Sketch solve the cardinality and frequency estimation problems?

What the interviewer is really asking: Can you apply probabilistic data structures to solve counting problems at scale where exact counting is too expensive?

Answer framework:

HyperLogLog (HLL): Estimates the cardinality (number of unique elements) of a set using O(1) memory (typically 12KB for a standard error of ~0.8%). Used when exact counting would require O(n) memory.

How it works: hash each element, observe the position of the leftmost 1-bit in the hash. The maximum observed position across all elements estimates log2(cardinality). Use many independent estimators (registers) and harmonically average them for accuracy.

Applications: counting unique visitors on a website (Redis's PFADD/PFCOUNT), counting distinct search queries, estimating join cardinalities in query optimizers (PostgreSQL uses HLL for this), and monitoring unique events in data streams.

Count-Min Sketch (CMS): Estimates the frequency of elements (how many times each element appears) using a fixed-size 2D array. Multiple hash functions map each element to positions in the array, incrementing counters. To query frequency, take the minimum counter across all hash functions (hence "count-min"). Always overestimates (never undercounts).

Applications: heavy hitter detection (find the most frequent elements in a stream — top URLs, top search queries), network traffic analysis (identify the highest-bandwidth flows), and approximate frequency counting in Spark and Flink.

Comparison:

  • HLL answers "how many unique elements?" — cardinality estimation
  • CMS answers "how many times does this element appear?" — frequency estimation
  • Bloom filter answers "is this element in the set?" — membership test

All three are streaming algorithms — they process elements one at a time, use sublinear memory, and provide approximate answers with configurable error bounds.

See our distributed systems guide, system design interview guide, and how Redis works.

Follow-up questions:

  • How does Redis implement HyperLogLog, and what is its memory usage?
  • What is the relationship between Count-Min Sketch accuracy and the size of the 2D array?
  • How would you combine HyperLogLog counts from multiple servers for a distributed unique count?

7. Explain how merkle trees are used in distributed systems.

What the interviewer is really asking: Do you understand how distributed systems efficiently detect and reconcile data inconsistencies?

Answer framework:

A Merkle tree (hash tree) is a binary tree where each leaf node contains the hash of a data block, and each internal node contains the hash of its children's hashes. The root hash summarizes the entire dataset.

Key property: if two Merkle trees have the same root hash, all their data is identical. If root hashes differ, you can efficiently find which data blocks differ by comparing subtrees — at each level, only follow the branches where hashes differ. This provides O(log n) comparison complexity instead of O(n) for a block-by-block comparison.

Distributed systems applications:

Anti-entropy repair (Cassandra, DynamoDB): Replicas periodically exchange Merkle tree root hashes. If they match, the replicas are consistent. If they differ, they traverse the tree to identify and sync only the inconsistent data ranges. This minimizes network traffic for consistency checks.

Blockchain: Each block contains a Merkle root of all transactions. This allows efficient verification that a specific transaction is included in a block (Merkle proof) without downloading all transactions — essential for lightweight clients.

Git: Git stores content as a tree of hashes. Each commit points to a tree hash, which recursively contains hashes of subtrees and blob hashes. This enables efficient diff computation and integrity verification.

Certificate transparency: Merkle trees enable auditable logs where entries can be proven to exist (inclusion proof) and the log can be proven to be append-only (consistency proof).

IPFS/content-addressable storage: Files are split into blocks, hashed into a Merkle DAG. The root hash serves as the content address. Changes to any block propagate a new root hash.

See our how blockchain works, distributed systems guide, and consistency concepts.

Follow-up questions:

  • How does Cassandra build and exchange Merkle trees for anti-entropy repair?
  • What is a Merkle proof (inclusion proof) and how is it used in blockchains?
  • What is the overhead of maintaining Merkle trees for large datasets?

8. How do inverted indexes work in search engines?

What the interviewer is really asking: Do you understand the core data structure that makes full-text search possible?

Answer framework:

An inverted index maps each unique term (word) to the list of documents containing that term (posting list). This is "inverted" compared to a forward index (document → terms).

Structure:

  • Dictionary: Sorted list of all unique terms. Often stored as a trie or FST (Finite State Transducer) for space-efficient prefix lookup.
  • Posting lists: For each term, a sorted list of document IDs, often with additional information: term frequency (TF — how often the term appears in the document), positions (character or word positions for phrase queries), and payloads (custom scoring data).

Building an inverted index:

  1. Tokenize: Split documents into terms ("The quick brown fox" → ["the", "quick", "brown", "fox"])
  2. Normalize: Lowercase, stem ("running" → "run"), remove stop words ("the", "is", "at")
  3. Build posting lists: For each term, append the document ID to its posting list
  4. Compress: Posting lists are sorted, so delta encoding (store differences between consecutive IDs) combined with variable-byte encoding dramatically reduces size

Query processing:

  • Single term: Look up the term in the dictionary, return its posting list.
  • Boolean AND: Intersect posting lists. Since lists are sorted, use a merge-join algorithm (O(n+m)). Skip lists within posting lists accelerate intersection.
  • Boolean OR: Union posting lists.
  • Phrase queries: Check positions — terms must appear consecutively.
  • Ranking: Score each matching document using TF-IDF or BM25 (term frequency, inverse document frequency, field length normalization).

Scale considerations: Elasticsearch and Solr shard the inverted index across multiple nodes. Each shard is a self-contained Lucene index. Queries run on all shards in parallel, and results are merged by a coordinating node.

See our search engine system design, Elasticsearch interview questions, and how Elasticsearch works.

Follow-up questions:

  • How does posting list compression work (delta encoding, variable-byte encoding)?
  • What is the difference between TF-IDF and BM25 scoring?
  • How do you handle index updates in a system receiving millions of new documents per day?

9. Explain the role of write-ahead logs (WAL) in system reliability.

What the interviewer is really asking: Do you understand the fundamental mechanism for durability in databases and distributed systems?

Answer framework:

A Write-Ahead Log (WAL) ensures durability by writing changes to a sequential log on disk before modifying the actual data structures. If the system crashes, it replays the log from the last checkpoint to recover to a consistent state.

The key principle: sequential writes to the WAL are much faster than random writes to data files (modifying B-tree pages, updating index entries). The WAL allows the system to batch and defer the expensive random writes while providing immediate durability.

WAL usage patterns:

Databases (PostgreSQL, MySQL, SQLite): Every transaction is first written to the WAL. Periodic checkpoints flush modified pages to data files. On crash recovery, replay WAL entries after the last checkpoint. PostgreSQL's WAL also powers streaming replication — standbys apply WAL records to maintain near-identical copies.

Distributed consensus (Raft, Paxos): Each node logs proposed commands to its WAL before voting. This ensures committed entries survive crashes. The WAL is the authoritative record of the committed command sequence.

Message queues (Kafka): Kafka's commit log is essentially a WAL — messages are appended to segment files sequentially, providing high write throughput and durability. Old segments are compacted or deleted based on retention policies.

Key-value stores (RocksDB): Writes go to the WAL and memtable simultaneously. The WAL ensures durability; the memtable provides fast reads. When the memtable is flushed to an SSTable, the corresponding WAL entries are no longer needed.

Design considerations:

  • Sync frequency: fsync after every write (safest, slowest) vs batched fsync (risk losing the last few writes on crash, but much faster). PostgreSQL's synchronous_commit setting controls this trade-off.
  • WAL size management: Checkpointing advances the low-water mark, allowing old WAL entries to be recycled. If checkpoints fall behind, WAL grows unboundedly.
  • Group commit: Batch multiple transactions' WAL entries into a single fsync call. Dramatically improves throughput with minimal latency increase.

See our PostgreSQL interview questions, how databases work, and distributed systems guide.

Follow-up questions:

  • What is the performance difference between synchronous and asynchronous WAL writes?
  • How does PostgreSQL use the WAL for both crash recovery and replication?
  • What is group commit, and how does it improve transaction throughput?

10. How do LRU caches work, and what variations exist for production systems?

What the interviewer is really asking: Can you implement and optimize a cache eviction policy for real-world workloads?

Answer framework:

LRU (Least Recently Used): Evict the item that was accessed least recently. Implementation: a hash map for O(1) lookup combined with a doubly-linked list for O(1) order maintenance. On access, move the item to the head of the list. On eviction, remove from the tail.

Limitations of pure LRU: one-time scans (a batch job reading many items once) pollute the cache, evicting frequently accessed items. A single scan of a large dataset can flush the entire cache.

Production variations:

LRU-K (used in databases): Track the last K accesses to each item. Evict the item whose K-th most recent access is the oldest. LRU-2 (K=2) is most common. This prevents one-time scans from evicting frequently accessed items because an item must be accessed at least K times to have a K-th access time.

W-TinyLFU (used in Caffeine, Java's best-in-class cache): Combines a small LRU window (admission filter) with a large SLRU (Segmented LRU) main cache. New items enter the window. When evicted from the window, they compete with the main cache's victim using a TinyLFU frequency sketch (Count-Min Sketch). This achieves near-optimal hit rates by combining recency (LRU) and frequency (LFU).

CLOCK (second-chance algorithm): Approximation of LRU used in operating systems for page replacement. Items are arranged in a circular buffer with a "referenced" bit. The clock hand sweeps, clearing referenced bits. When it finds an unreferenced item, it evicts it. Simpler than full LRU (no linked list manipulation) with similar performance.

ARC (Adaptive Replacement Cache): Dynamically balances between recency and frequency by maintaining ghost lists of recently evicted items. If a recently evicted recency item is accessed again, increase the recency portion of the cache. If a recently evicted frequency item is accessed, increase the frequency portion.

For distributed caches, the eviction policy runs independently on each node. The key design decision is the partitioning strategy (which node holds which keys), not the eviction policy.

See our how caching works, system design interview guide, and how Redis works.

Follow-up questions:

  • How does Caffeine's W-TinyLFU achieve better hit rates than LRU?
  • What is the memory overhead of maintaining an LRU linked list for millions of cache entries?
  • How would you choose a cache eviction policy for a CDN vs a database buffer pool?

11. How are quadtrees and geohashes used in location-based systems?

What the interviewer is really asking: Can you choose the right spatial data structure for proximity searches at scale?

Answer framework:

Quadtree: Recursively divides 2D space into four quadrants. Each node represents a region; when a region contains more than a threshold of points, it subdivides into four children. Adaptive — densely populated areas have deeper subdivisions.

For a "find nearby restaurants" query, the quadtree narrows the search to the relevant geographic region(s), then scans only the points in those regions. Insertion and querying are O(log n) average case.

Geohash: Encodes a 2D coordinate (latitude, longitude) into a 1D string using interleaved binary representations. Nearby locations share common prefixes. Geohash precision levels: 1 character ≈ 5000km, 5 characters ≈ 5km, 8 characters ≈ 19m.

Advantage: geohashes turn 2D proximity queries into 1D prefix queries, which standard databases and caches handle efficiently. WHERE geohash LIKE 'dpz83%' finds all locations in a ~5km cell. Can be stored in any string-indexed data structure (B-tree, Redis sorted set).

Disadvantage: edge cases at geohash cell boundaries. Two locations 1 meter apart may have completely different geohashes if they fall in adjacent cells. Solution: query the target cell and all 8 neighboring cells.

S2 Geometry (Google's approach): Divides the Earth's surface using a Hilbert space-filling curve on a cube projected onto the sphere. S2 cells have predictable shapes and sizes. Used in Google Maps, Uber's H3 (similar concept but hexagonal cells) is used for ride-matching.

System design applications:

  • Uber ride matching: Use a geospatial index (H3 or S2) to find nearby drivers. The index maps cell IDs to lists of drivers in each cell.
  • Yelp nearby search: PostgreSQL with PostGIS extension uses R-trees (generalization of quadtrees) for spatial indexing. Or use Elasticsearch's geo_point type with BKD trees.
  • Food delivery radius: Geohash in Redis sorted sets for O(1) lookup of restaurants within a delivery radius.

See our location-based service system design, Elasticsearch interview questions, and system design interview guide.

Follow-up questions:

  • How does Uber use H3 hexagonal cells for ride matching?
  • What is the edge problem with geohashes, and how do you solve it?
  • When would you choose a quadtree over geohashes?

12. Explain how CRDTs enable conflict-free distributed data structures.

What the interviewer is really asking: Do you understand eventual consistency primitives that can resolve conflicts without coordination?

Answer framework:

CRDTs (Conflict-free Replicated Data Types) are data structures that can be replicated across nodes and modified independently, with a mathematical guarantee that all replicas converge to the same state — without coordination or conflict resolution.

Two types:

  • State-based CRDTs (CvRDTs): Replicas send their full state to other replicas. States are merged using a commutative, associative, idempotent merge function. The merge function computes the join (least upper bound) of two states.
  • Operation-based CRDTs (CmRDTs): Replicas broadcast operations to other replicas. Operations must be commutative (applying ops in any order yields the same result). Requires reliable delivery (each op is delivered at least once).

Common CRDT types:

G-Counter (grow-only counter): Each replica maintains its own counter. The merged value is the sum of all replicas' counters. Only supports increment.

PN-Counter: Two G-Counters — one for increments, one for decrements. The value is the difference. Supports both increment and decrement.

LWW-Register (Last Writer Wins): Each value has a timestamp. The merge function keeps the value with the highest timestamp. Simple but depends on clock accuracy.

OR-Set (Observed-Remove Set): Supports add and remove. Each element is tagged with a unique identifier when added. Remove operations only remove specific tags, not the element universally. An element is in the set if it has any un-removed tags.

Real-world applications:

  • Redis CRDT (Redis Enterprise): Active-active replication across data centers with automatic conflict resolution using CRDTs.
  • Riak: Distributed key-value store that offers CRDT-based data types (counters, sets, maps) for conflict-free multi-master replication.
  • Collaborative editing: Google Docs-style collaboration uses CRDT-like structures (or operational transformation) for conflict-free concurrent editing.
  • Shopping cart: An OR-Set represents the cart. Items can be added and removed concurrently across replicas without conflicts.

See our consistency concepts, CAP theorem, and distributed systems guide.

Follow-up questions:

  • What is the space overhead of an OR-Set compared to a simple set?
  • When would you use a CRDT vs a consensus protocol (Raft) for distributed state?
  • How does collaborative editing (Google Docs) achieve conflict-free concurrent edits?

13. How do ring buffers (circular buffers) power high-performance systems?

What the interviewer is really asking: Do you know about the data structure that underlies many high-performance systems and why it's so efficient?

Answer framework:

A ring buffer is a fixed-size buffer that wraps around — when the write pointer reaches the end, it wraps to the beginning (overwriting old data if the consumer hasn't caught up). It provides O(1) enqueue and dequeue with zero allocation overhead.

Why ring buffers are fast: fixed memory allocation (no GC pressure), CPU cache-friendly (contiguous memory), no pointer chasing (unlike linked lists), and minimal synchronization needed (single producer, single consumer can be lock-free with atomic read/write pointers).

System design applications:

LMAX Disruptor: A ring buffer-based inter-thread messaging library that achieves millions of operations per second with nanosecond latency. Used in financial trading systems. The key insight: pre-allocate all entries in the ring buffer, avoiding allocation during operation. Multiple consumers can read from the same ring buffer at different positions.

Linux kernel: Network packet buffers (sk_buff ring), I/O submission queues (io_uring), and kernel trace buffers all use ring buffers for zero-copy, lock-free data passing between kernel and user space.

Logging: High-performance loggers (Log4j2's AsyncLogger) use ring buffers to decouple log production from file writing. Application threads write log entries to the ring buffer; a background thread writes to disk.

Network I/O: DPDK and similar high-performance networking frameworks use ring buffers for passing packets between processing stages without copying.

Metrics collection: Circular buffers store the last N data points for rolling window calculations (moving average, percentile computation) without dynamic allocation.

Trade-offs: fixed size (must be chosen upfront), old data is overwritten (not suitable when data loss is unacceptable), and bounded capacity (must handle backpressure when the buffer is full).

See our system design interview guide and backend development crash course.

Follow-up questions:

  • How does the LMAX Disruptor achieve such high throughput?
  • What is the cache line padding technique used in high-performance ring buffers?
  • How do you size a ring buffer for a producer-consumer system?

14. How are priority queues and heaps used in system design?

What the interviewer is really asking: Can you identify system design problems that are essentially priority queue problems?

Answer framework:

A priority queue provides O(log n) insertion and O(1) access to the highest (or lowest) priority element, with O(log n) removal. Typically implemented as a binary heap.

System design applications:

Task scheduling: OS process schedulers and application task schedulers use priority queues to select the highest-priority ready task. Kubernetes scheduler uses a priority queue to order pods waiting for scheduling.

Rate limiting (token bucket with priority): When multiple requests compete for limited capacity, a priority queue ensures high-priority requests are served first.

Dijkstra's algorithm / shortest path: Finding the shortest path in a graph (used in network routing, map directions) relies on a min-heap to always expand the nearest unvisited node.

Merge K sorted lists/streams: In merge operations (K-way merge sort, merging SSTable files in LSM tree compaction), a min-heap efficiently selects the minimum element across K sorted inputs. O(n log K) total complexity.

Event-driven simulation: Discrete event simulators use a priority queue ordered by event time to process events in chronological order.

Top-K problems: Finding the top-K elements from a stream. Maintain a min-heap of size K. For each new element, if it's larger than the heap's minimum, replace the minimum. After processing all elements, the heap contains the top K.

Median finding: Maintain two heaps — a max-heap for the lower half and a min-heap for the upper half. The median is at the top of one or both heaps. O(log n) per insertion, O(1) median access.

Specialized heaps: Fibonacci heap (O(1) amortized decrease-key, used in Dijkstra's for dense graphs), d-ary heap (wider fan-out, better cache utilization), and pairing heap (simpler implementation with good practical performance).

See our system design interview guide and distributed systems guide.

Follow-up questions:

  • How does a priority queue help in the K-way merge during LSM tree compaction?
  • What is the time complexity of finding the top-K elements from a stream of N elements?
  • When would you use a Fibonacci heap over a binary heap?

15. How do DAGs (Directed Acyclic Graphs) represent dependencies in data pipelines and build systems?

What the interviewer is really asking: Can you model dependency relationships and understand their implications for scheduling and execution?

Answer framework:

A DAG (Directed Acyclic Graph) is a graph with directed edges and no cycles. It's the natural data structure for representing dependencies where some tasks must complete before others can begin.

Topological sort: processing nodes in an order where every node appears after its dependencies. This is how DAGs are executed. Multiple valid topological orders may exist, enabling parallelism — tasks without dependencies between them can execute concurrently.

Critical path: The longest path through the DAG determines the minimum possible execution time (assuming unlimited parallelism). Identifying and optimizing the critical path is key to reducing overall execution time.

System design applications:

Data pipeline orchestration (Airflow, Dagster): Each task is a node; edges represent data dependencies. Airflow's scheduler executes tasks in topological order, running independent tasks in parallel. The DAG structure prevents cycles (which would create infinite loops) and ensures deterministic execution order.

Build systems (Make, Bazel, Gradle): Source files and build targets form a DAG. Targets depend on source files and other targets. The build system topologically sorts the DAG and builds targets in dependency order. Incremental builds re-execute only targets whose inputs have changed.

Package managers (npm, pip, cargo): Package dependencies form a DAG. Cycle detection identifies invalid dependency configurations. Topological sort determines installation order.

Distributed computation (Spark, Flink): Data transformations form a DAG of stages. The scheduler optimizes execution by: pipelining stages that don't require data shuffling, parallelizing independent stages, and re-executing failed stages without recomputing the entire DAG.

Version control (Git): Commits form a DAG. Merge commits have multiple parents. The DAG structure enables efficient common ancestor computation (for merge operations) and reachability queries (which commits are ancestors of a given commit).

See our data engineering interview questions, system design interview guide, and distributed systems guide.

Follow-up questions:

  • How do you detect cycles in a dependency graph?
  • What is the critical path method, and how does it apply to data pipeline optimization?
  • How does Spark's DAG scheduler optimize execution?

Common Mistakes in Data Structure Interviews

  1. Not connecting data structures to real systems — Describing a Bloom filter's mechanism without explaining where it's used in production shows textbook knowledge without practical depth.

  2. Ignoring space-time trade-offs — Every data structure has a space-time trade-off. Discuss memory usage alongside time complexity.

  3. Forgetting about concurrent access — In system design, data structures are accessed by multiple threads/processes. Discuss thread-safety implications.

  4. Using the wrong data structure — Choosing a hash map when you need ordering (use a B-tree) or a linked list when you need random access (use an array) shows weak fundamentals.

  5. Not considering disk-based variants — In-memory data structures don't work for datasets larger than memory. Know the disk-based variants (B-tree vs binary tree, LSM tree vs hash table).

  6. Over-complicating — Sometimes an array with binary search is better than a complex data structure. Show judgment in choosing simplicity when appropriate.

How to Prepare

Week 1: Review fundamental data structures (hash tables, trees, heaps, graphs). Implement an LRU cache and a Bloom filter from scratch.

Week 2: Study probabilistic data structures (HyperLogLog, Count-Min Sketch, skip lists). Understand their error bounds and memory usage.

Week 3: Study distributed data structures (consistent hashing, Merkle trees, CRDTs). Understand how they solve specific distributed systems challenges.

Week 4: Practice connecting data structures to system design problems. For each system design question, identify which data structures are critical and why.

For comprehensive preparation, see our system design interview guide and explore the learning paths. Check our pricing plans for full access.

Related Resources

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.