Concurrency Interview Questions for Senior Engineers (2026)

Why Concurrency Matters in Senior Engineering Interviews

Concurrency is one of the most challenging topics in software engineering, and it separates senior engineers from mid-level ones in interviews. Every production system deals with concurrency — web servers handling thousands of concurrent requests, databases managing simultaneous transactions, distributed systems coordinating across nodes, and modern CPUs executing instructions out of order. Bugs in concurrent code are among the hardest to find and reproduce because they depend on timing, ordering, and system load.

Companies like Google, Amazon, and Meta test concurrency deeply because their systems process millions of concurrent operations. Interviewers want to see that you understand thread safety, can identify race conditions, know when to use locks versus lock-free approaches, and can design concurrent systems that are both correct and performant.

This guide covers the most challenging concurrency interview questions with structured answer frameworks. For related topics, see our distributed systems guide, system design interview guide, and backend development crash course.

1. Explain the difference between concurrency and parallelism.

What the interviewer is really asking: Do you have a precise understanding of these commonly confused terms, and can you reason about which model fits different problems?

Answer framework:

Concurrency is about dealing with multiple things at once — structuring a program so it can handle multiple tasks that overlap in time. These tasks may or may not execute simultaneously. The focus is on program structure and correctness.

Parallelism is about doing multiple things at once — using multiple processing units (CPU cores, machines) to execute tasks simultaneously. The focus is on performance and throughput.

A single-core CPU can be concurrent (handling multiple tasks by switching between them rapidly) but not truly parallel (only one instruction executes at any moment). A multi-core CPU enables both concurrency and parallelism.

Practical implications: an async web server (Node.js, Go's goroutines, Python's asyncio) is concurrent — it handles thousands of connections on a single thread by switching between I/O-bound tasks. A MapReduce job is parallel — it processes data on hundreds of machines simultaneously. Many systems use both: a web server concurrently handles requests, each of which may parallelize computation across CPU cores.

Rob Pike's formulation: "Concurrency is about structure, parallelism is about execution." You design for concurrency (your program structure handles multiple tasks correctly) and leverage parallelism (your runtime uses multiple cores or machines for speed).

See our distributed systems guide for how concurrency extends to multi-machine systems.

Follow-up questions:

Is Go's goroutine model concurrent, parallel, or both?
How does Python's GIL affect the distinction between concurrency and parallelism?
When would you choose concurrency without parallelism?

2. What is a race condition, and how do you prevent it?

What the interviewer is really asking: Can you identify subtle concurrency bugs and choose appropriate solutions?

Answer framework:

A race condition occurs when two or more threads access shared mutable state, at least one access is a write, and the accesses are not synchronized. The result depends on the unpredictable timing of thread execution.

Classic example: counter++ is not atomic — it involves read, increment, and write. Two threads can read the same value, increment independently, and write back, losing one increment.

Prevention strategies:

Mutual exclusion (locks/mutexes): Protect shared state with a lock. Only one thread can hold the lock at a time. Simple and correct, but can cause contention (threads waiting for the lock) and deadlocks (two threads waiting for each other's locks).

Atomic operations: Hardware-supported atomic instructions (compare-and-swap, fetch-and-add) for simple operations on single values. AtomicInteger.incrementAndGet() in Java, sync/atomic in Go, std::atomic in C++. Lock-free and high-performance, but limited to simple operations.

Immutability: If data cannot be modified after creation, no race conditions are possible. Functional programming languages emphasize this. In practice: use immutable data structures, make defensive copies, and use builder patterns for construction.

Thread-local storage: Each thread has its own copy of the data. No sharing, no races. Useful for per-request context (request ID, user session) in web servers.

Message passing: Instead of sharing memory, threads communicate by sending messages through channels (Go channels, Erlang mailboxes, actor model). "Don't communicate by sharing memory; share memory by communicating."

Confinement: Ensure each piece of mutable state is accessed by only one thread. The owning thread performs all reads and writes. Other threads send requests to the owner thread. This is the principle behind single-threaded event loops (Node.js, Redis).

See our backend development crash course and system design interview guide.

Follow-up questions:

How do you detect race conditions in production code?
What is a TOCTOU (Time of Check to Time of Use) race condition?
How does Go's race detector work?

3. Explain deadlocks and the strategies to prevent them.

What the interviewer is really asking: Can you systematically reason about deadlock conditions and design systems that avoid them?

Answer framework:

A deadlock occurs when two or more threads are permanently blocked, each waiting for a resource held by the other. Four conditions must be simultaneously true (Coffman conditions):

Mutual exclusion: Resources cannot be shared (a lock is held exclusively).
Hold and wait: A thread holds one resource while waiting for another.
No preemption: Resources cannot be forcibly taken from a thread.
Circular wait: A circular chain of threads exists, each waiting for a resource held by the next.

Breaking any one condition prevents deadlock:

Break hold and wait: Acquire all needed locks at once (atomically) or release all before acquiring new ones. Practical implementation: try-lock with rollback — attempt to acquire all locks, if any fails, release all acquired locks and retry.

Break circular wait (lock ordering): Define a global ordering of locks (by address, by name, by hierarchy). All threads must acquire locks in the same order. If Lock A < Lock B, no thread ever acquires B before A. This is the most commonly used strategy in practice.

Break no preemption (timeout): Use lock timeout — tryLock(timeout). If a thread cannot acquire a lock within the timeout, it releases all held locks and retries. This converts deadlock into livelock (threads retry indefinitely), which is usually preferable because it allows recovery.

Avoid mutual exclusion: Use lock-free data structures or read-write locks (multiple readers, exclusive writer). Read-write locks reduce contention for read-heavy workloads.

Detection and recovery: In databases, deadlock detectors run periodically (or on lock wait timeout), build a wait-for graph, detect cycles, and abort one transaction (the victim). PostgreSQL and MySQL both use this approach. Application-level deadlock detection is less common but possible with monitoring.

See our ACID properties and distributed systems guide.

Follow-up questions:

How would you debug a deadlock in a production system?
What is the difference between deadlock and livelock?
How do databases handle distributed deadlocks (across multiple nodes)?

4. Describe lock-free and wait-free data structures.

What the interviewer is really asking: Do you understand advanced concurrency beyond basic locks? Can you reason about correctness of lock-free algorithms?

Answer framework:

Lock-free: An algorithm is lock-free if at least one thread always makes progress, regardless of what other threads do. Individual threads may stall, but the system as a whole moves forward. This eliminates deadlocks and reduces priority inversion issues.

Wait-free: Stronger than lock-free — every thread completes its operation in a bounded number of steps, regardless of other threads. No thread can starve another. More difficult to implement and often slower in practice than lock-free.

The foundation of lock-free programming is the Compare-and-Swap (CAS) operation: atomically compare a memory location to an expected value and, if they match, replace it with a new value. If they don't match, the operation fails and the thread retries. This is a hardware-level primitive on modern CPUs.

Common lock-free data structures:

Lock-free stack (Treiber stack): Push uses CAS to atomically update the head pointer. Pop uses CAS to atomically read and remove the head. On CAS failure (another thread modified the head), retry.

Lock-free queue (Michael-Scott queue): Uses two CAS operations — one for the tail (enqueue) and one for the head (dequeue). This allows concurrent enqueue and dequeue without blocking.

Lock-free hash map (like Java's ConcurrentHashMap): Uses fine-grained locking (per-bucket) or lock-free linked lists within each bucket. ConcurrentHashMap in Java 8+ uses CAS for most operations and synchronized blocks only for bucket initialization.

The ABA problem: CAS can be fooled if a value changes from A to B and back to A between the read and CAS. The CAS succeeds because the value matches, but the underlying data structure may have changed. Solutions: tagged pointers (add a version counter), hazard pointers (track references), or epoch-based reclamation.

Practical guidance: prefer proven concurrent data structures from standard libraries (Java's java.util.concurrent, Go's sync.Map, C++'s std::atomic) over custom lock-free implementations. Lock-free programming is notoriously difficult to get right.

See our data structures for system design and backend development crash course.

Follow-up questions:

What is the ABA problem and how is it solved?
When would you choose a lock-free data structure over a simple mutex-protected one?
How do you test lock-free data structures for correctness?

5. How does the Java Memory Model (or Go's/C++'s memory model) affect concurrent programming?

What the interviewer is really asking: Do you understand memory visibility, ordering guarantees, and the gap between what your code says and what the hardware does?

Answer framework:

Modern CPUs and compilers reorder instructions and cache memory values for performance. Without a memory model, one thread's writes may not be visible to another thread, even after the writing thread completes.

The Java Memory Model (JMM) defines "happens-before" relationships that guarantee visibility:

Monitor lock: Unlocking a monitor happens-before subsequent locking of the same monitor. All writes before the unlock are visible to code after the lock.
volatile: A write to a volatile variable happens-before subsequent reads of the same variable. This provides visibility guarantee without mutual exclusion.
Thread start/join: thread.start() happens-before any action in the started thread. All actions in a thread happen-before thread.join() returns.
Final fields: Properly constructed objects with final fields are safely published — other threads see the correct final field values without synchronization.

Common pitfalls:

Double-checked locking (pre-volatile fix): A thread may see a non-null reference to an object that is not fully constructed. The fix: declare the field volatile (Java 5+) or use an initialization-on-demand holder pattern.

Instruction reordering: x = 1; flag = true; — another thread checking flag before reading x may see flag == true but x == 0 because the CPU or compiler reordered the writes. Memory barriers (via volatile, synchronized, or atomic operations) prevent this.

Go's memory model: Similar principles. Goroutines communicate safely through channels (which provide happens-before guarantees) or sync primitives (Mutex, WaitGroup). The go statement and channel operations establish happens-before relationships.

C++'s memory model (C++11): Most flexible. Atomic operations have configurable memory ordering: memory_order_relaxed (no ordering guarantees), memory_order_acquire/release (one-directional barriers), memory_order_seq_cst (strongest, sequential consistency — the default). This gives systems programmers fine-grained control at the cost of complexity.

Practical advice: use high-level synchronization primitives (locks, channels, concurrent data structures) that provide correct memory ordering automatically. Drop to low-level memory ordering only when performance measurements show it's necessary.

See our backend development crash course and distributed systems guide.

Follow-up questions:

What is the difference between volatile in Java and volatile in C/C++?
How do memory barriers affect CPU pipeline performance?
When would you use relaxed memory ordering in C++?

6. Compare async/await, threads, and green threads/goroutines.

What the interviewer is really asking: Can you choose the right concurrency model for different workloads?

Answer framework:

OS threads (Java threads, pthreads): Managed by the operating system kernel. Each thread has its own stack (1-8MB default), and context switching involves a system call. Creating thousands of threads is expensive (memory overhead, context switch cost). Best for CPU-bound parallelism where you want each core executing a thread.

Green threads / goroutines / fibers: User-space threads managed by the language runtime, not the OS. Much lighter than OS threads (goroutines start at ~2KB stack, growing as needed). A runtime scheduler multiplexes thousands of green threads onto a smaller number of OS threads. Context switching happens in user space without system calls.

Go goroutines, Java virtual threads (Project Loom, Java 21+), Erlang processes, and Kotlin coroutines use this model. Best for I/O-bound workloads with many concurrent connections (web servers, microservices).

Async/await (JavaScript, Python asyncio, Rust async, C#): Cooperative multitasking on a single thread (or a small thread pool). Functions yield control at await points, and the event loop schedules other tasks while waiting for I/O. No thread creation overhead, no context switching between OS threads.

Best for I/O-bound workloads. Not suitable for CPU-bound work on a single thread (it blocks the event loop). Node.js uses this model — a single event loop handles thousands of concurrent connections, but CPU-intensive operations must be offloaded to worker threads.

Comparison:

Memory per unit: OS thread (~1MB) >> green thread (~2KB) > async task (~few hundred bytes)
CPU-bound work: OS threads (true parallelism) > green threads (runtime-scheduled parallelism) > async (single-threaded, no parallelism)
I/O-bound work: async (lowest overhead) ≈ green threads (natural programming model) > OS threads (too heavyweight for thousands of connections)
Programming model: async/await (requires coloring problem — async functions can only be called from async contexts) vs green threads (blocking code looks synchronous, runtime handles scheduling) vs threads (explicit synchronization required)

See our backend development crash course and system design interview guide.

Follow-up questions:

What is the "function coloring" problem with async/await?
How does Go's goroutine scheduler work (M:N threading model)?
When would you choose OS threads over goroutines in Go?

7. How do you design a thread-safe cache?

What the interviewer is really asking: Can you apply concurrency concepts to a practical data structure with specific performance requirements?

Answer framework:

A thread-safe cache must support concurrent reads and writes while maintaining consistency and good performance.

Simple approach — synchronized map: Wrap a HashMap with a mutex. Thread-safe but poor performance — every operation (read or write) holds the exclusive lock. Reads block other reads unnecessarily.

Read-write lock: Use a RWMutex (Go) or ReentrantReadWriteLock (Java). Multiple readers can hold the read lock simultaneously; writers need the exclusive write lock. Good for read-heavy workloads. Limitation: write starvation — if readers are continuous, writers may wait indefinitely.

Striped locking: Partition the key space into N buckets (stripes), each with its own lock. Operations on different stripes don't contend. ConcurrentHashMap in Java uses this approach with 16 default segments. Effective for uniformly distributed keys.

Lock-free with CAS: Use a lock-free hash map backed by atomic operations. Java's ConcurrentHashMap (Java 8+) uses CAS for most operations. Highest throughput under heavy contention.

Copy-on-write: Maintain an immutable map reference. Reads access the current map without locking. Writes create a new copy with the modification and atomically swap the reference. Good for read-heavy caches with infrequent writes. Bad for write-heavy workloads (copying is expensive).

Eviction policy implementation:

LRU with concurrent access: Java's LinkedHashMap is not thread-safe. Options: synchronize access (simple but contention), use a concurrent skip list for ordering (complex), or use a segmented LRU (Caffeine library's approach — segment the access order list to reduce contention).
Caffeine (Java): High-performance concurrent cache using W-TinyLFU eviction policy. Uses a striped ring buffer for recording accesses, amortized O(1) operations, and write buffering.

Expiration: lazy expiration (check TTL on access, remove if expired) vs proactive expiration (background thread scans for expired entries periodically). Combine both for best balance.

See our how caching works, data structures for system design, and system design interview guide.

Follow-up questions:

How would you implement a distributed cache that's thread-safe across multiple servers?
What is the thundering herd problem in caching, and how do you prevent it?
How does Go's sync.Map differ from a mutex-protected map?

8. Explain the producer-consumer pattern and its implementations.

What the interviewer is really asking: Can you implement a fundamental concurrent design pattern with proper synchronization?

Answer framework:

The producer-consumer pattern decouples data production from consumption using a bounded buffer. Producers add items to the buffer; consumers remove items. The buffer provides backpressure (producers block when the buffer is full) and smooths bursts (consumers process at their own pace).

Blocking queue implementation: Java's BlockingQueue interface (ArrayBlockingQueue, LinkedBlockingQueue) provides put() (blocks if full) and take() (blocks if empty) with built-in synchronization. Under the hood, it uses ReentrantLock with two conditions (notFull and notEmpty).

Go channels: Go channels are the idiomatic producer-consumer mechanism. ch := make(chan Task, bufferSize) creates a buffered channel. ch <- task blocks if the channel is full (backpressure). task := <-ch blocks if the channel is empty. Channels provide memory safety (no shared state) and clean syntax.

Ring buffer (lock-free): For maximum performance, use a lock-free ring buffer (LMAX Disruptor pattern). Producer and consumer maintain separate position counters. The producer advances its counter after writing; the consumer advances after reading. CAS operations ensure thread safety without locks. The Disruptor achieves millions of operations per second with nanosecond latency.

Design considerations:

Buffer size: Too small — producers block frequently, reducing throughput. Too large — high memory usage and increased latency (items sit in the buffer longer). Common approach: start with a reasonable estimate (e.g., 10x expected throughput per second), tune based on production metrics.
Multiple producers/consumers: Multiple producers can safely write to a blocking queue or channel. Multiple consumers enable parallel processing. Ensure work distribution is fair.
Graceful shutdown: Use a poison pill (special sentinel value) or close the channel (Go) to signal consumers to stop. Drain remaining items before shutdown.
Error handling: If a consumer fails on an item, should it retry, skip, or dead-letter the item? Design error handling that doesn't block other consumers.

See our how Kafka works (Kafka is essentially a distributed producer-consumer system) and backend development crash course.

Follow-up questions:

How does the LMAX Disruptor achieve such high throughput?
What happens when a consumer is consistently slower than the producer?
How do you monitor and tune a producer-consumer system in production?

9. How do you handle concurrency in database access?

What the interviewer is really asking: Can you prevent data corruption and race conditions when multiple application threads or services access the same database?

Answer framework:

Database concurrency operates at two levels: application-level concurrency (multiple threads in your application) and system-level concurrency (multiple application instances or services).

Optimistic concurrency control: Read the record with a version number. When updating, include the version in the WHERE clause: UPDATE accounts SET balance = 100, version = version + 1 WHERE id = 123 AND version = 5. If another transaction modified the row (version changed), the UPDATE affects 0 rows, and the application retries. Best for low-contention scenarios where conflicts are rare.

Pessimistic concurrency control: Lock the row when reading: SELECT * FROM accounts WHERE id = 123 FOR UPDATE. This prevents other transactions from modifying the row until the lock-holder commits. Best for high-contention scenarios where conflicts are frequent.*

Isolation levels: Choose the right transaction isolation level.

READ COMMITTED (default in PostgreSQL): Each statement sees the latest committed data. Prevents dirty reads. Allows non-repeatable reads.
REPEATABLE READ: Transaction sees a consistent snapshot from its start. Prevents dirty and non-repeatable reads. In PostgreSQL, uses MVCC snapshots.
SERIALIZABLE: Transactions behave as if executed serially. Prevents all anomalies but reduces concurrency. PostgreSQL implements this with Serializable Snapshot Isolation (SSI).

Advisory locks: PostgreSQL's advisory locks provide application-level locking. SELECT pg_advisory_lock(hashtext('resource_123')) acquires a lock. Useful for coordinating across application instances without locking database rows.

Connection pooling: Application threads share database connections through a pool. Configure pool size based on optimal concurrent connections for the database (see PostgreSQL connection pooling). Use transaction-mode pooling (PgBouncer) for short transactions.

Idempotent operations: Design database operations to be safely retryable. Use UPSERT (INSERT ON CONFLICT DO UPDATE) instead of INSERT that fails on duplicates. Use conditional updates that check preconditions.

See our PostgreSQL interview questions, ACID properties, and system design interview guide.

Follow-up questions:

When would you choose optimistic vs pessimistic concurrency control?
How does PostgreSQL's MVCC handle concurrent transactions internally?
What is the lost update problem, and how do different isolation levels handle it?

10. Explain the actor model and when to use it.

What the interviewer is really asking: Do you know concurrency models beyond locks and threads?

Answer framework:

The actor model (Carl Hewitt, 1973) is a concurrency model where the fundamental unit is an actor — a lightweight entity that:

Has its own private state (not shared with any other actor)
Receives messages from other actors (via a mailbox/queue)
Processes messages one at a time (no concurrent access to its state)
Can create new actors, send messages, and modify its own state

Because an actor processes messages sequentially and its state is private, there are no race conditions or deadlocks within an actor. Concurrency comes from many actors processing messages simultaneously.

Implementations:

Erlang/Elixir: Actors (called "processes") are first-class. The BEAM VM schedules millions of lightweight processes. OTP supervision trees provide fault tolerance — when an actor crashes, its supervisor restarts it.
Akka (Java/Scala): Actor framework on the JVM. Typed actors, persistent actors (event sourcing), cluster-aware actors (distributed across nodes).
Microsoft Orleans (.NET): Virtual actors ("grains") with automatic activation/deactivation. Grains have identity, are location-transparent, and scale across a cluster.

When to use the actor model:

Chat systems (each user/room is an actor processing messages)
IoT device management (each device is an actor maintaining its state)
Game servers (each game session or entity is an actor)
Financial systems (each account is an actor ensuring serialized transactions)
Any system with millions of independent stateful entities

When not to use: simple request-response web applications (actors add unnecessary complexity), bulk data processing (Spark/Flink are better for data parallelism), or systems where shared state is genuinely needed (actors communicate by message passing, which adds latency vs direct memory access).

See our distributed systems guide and backend development crash course.

Follow-up questions:

How does Erlang's "let it crash" philosophy relate to the actor model?
What is the performance overhead of message passing vs shared memory?
How do you handle ordering guarantees between actors?

11. How do you implement rate limiting in a concurrent system?

What the interviewer is really asking: Can you implement a practical concurrent utility that's correct under high contention?

Answer framework:

Token bucket (most common): A bucket holds tokens (up to a maximum). Tokens are added at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected (or queued). Allows bursts up to the bucket capacity while enforcing an average rate.

Concurrent implementation: use an atomic variable for the token count. Refill is computed lazily — on each request, calculate how many tokens should have been added since the last access (based on elapsed time), add them (capped at max), then attempt to consume with CAS. This avoids a background refill thread.

The entire check-and-update must be atomic. Options: mutex (simple, correct), CAS loop (lock-free, higher throughput), or Redis with Lua script (distributed, atomic).

Sliding window counter (for distributed systems): Use Redis with a sorted set. Each request adds an entry with the current timestamp. Count entries within the window. Remove entries outside the window. Redis's single-threaded model ensures atomicity.

Leaky bucket: Requests enter a queue (bucket) and are processed at a fixed rate. Queue overflow rejects new requests. Smooths bursts (unlike token bucket which allows bursts). Implementation: a queue with a worker thread draining at a fixed rate.

Distributed rate limiting: For multi-server applications, rate limits must be enforced globally. Options: centralized counter in Redis (simple, single point of contention), local rate limiters with periodic sync (approximate but lower latency), or a dedicated rate limiting service.

See our system design for rate limiters, how caching works, and backend development crash course.

Follow-up questions:

How would you implement rate limiting per user across 100 application servers?
What is the race condition in a naive Redis-based rate limiter, and how do you solve it?
How do you choose between token bucket and sliding window?

12. How do you debug concurrency bugs in production?

What the interviewer is really asking: Have you actually dealt with concurrency bugs in production? These are notoriously difficult to reproduce and diagnose.

Answer framework:

Concurrency bugs are hard to debug because they're non-deterministic — they depend on timing, thread scheduling, and system load. They may not reproduce in development environments.

Prevention (better than debugging):

Use thread-safe data structures from standard libraries
Minimize shared mutable state
Use the Go race detector, Java's ThreadSanitizer, or Valgrind's Helgrind during testing
Code reviews with specific attention to concurrency patterns
Property-based testing with concurrent test scenarios

Detection in production:

Monitoring for symptoms: deadlocks (threads in BLOCKED state), livelocks (high CPU with no progress), race conditions (inconsistent data, occasionally wrong results, duplicate operations)
Thread dumps (Java): jstack or kill -3 to capture thread states. Look for threads BLOCKED on the same lock, or WAITING indefinitely. Periodic thread dumps reveal patterns.
Goroutine dumps (Go): runtime/pprof or SIGQUIT to dump all goroutine stacks. The runtime detects deadlocks (all goroutines blocked) and reports them.
Logging with correlation IDs: Log thread/goroutine ID and timestamps to reconstruct the sequence of operations. Use structured logging that can be queried.

Diagnosis techniques:

Reproduce under stress: Use load testing tools to increase concurrency until the bug manifests. Increase logging verbosity temporarily.
Minimize the reproducing case: Identify the specific interleaving that causes the bug. Often requires careful reading of code to identify the window where the race condition can occur.
Lock ordering analysis: Map all lock acquisition paths. Check for cycles (potential deadlocks). Tools: jcmd thread dump analysis, Go's -race flag.

Fixes:

Apply the minimum fix (add a lock, use an atomic operation, fix the lock ordering)
Add a regression test that exercises the concurrent scenario
Add monitoring for the specific symptom so you detect recurrence

See our monitoring system design and system design interview guide.

Follow-up questions:

How do you write a regression test for a race condition that's hard to reproduce?
What is the performance overhead of running Go's race detector in production?
How would you diagnose a memory leak caused by a concurrency bug (e.g., goroutine leak)?

13. How does connection pooling work, and what concurrency issues does it solve?

What the interviewer is really asking: Do you understand this critical production pattern and its thread-safety requirements?

Answer framework:

Connection pooling maintains a set of pre-established connections (database, HTTP, gRPC) that are reused across requests. This avoids the overhead of creating and destroying connections for every operation.

Concurrency requirements: the pool must be thread-safe. Multiple threads concurrently borrow and return connections. Key operations:

borrow(): Get a connection from the pool. If all connections are in use, wait (blocking) or fail (non-blocking).
return(): Return a connection to the pool. Must validate the connection is still healthy before making it available.

Internal data structure: a thread-safe queue (blocking queue) of available connections. Semaphore limits the maximum pool size. A separate thread validates idle connections periodically (removing stale ones).

Key parameters:

Max pool size: Maximum concurrent connections. For databases, this should match the database's optimal concurrent connection count (e.g., PostgreSQL performs best with 2*cores + disks connections).
Min idle: Minimum connections kept alive. Avoids cold-start latency.
Connection timeout: Maximum time to wait for a connection from the pool. Prevents request pile-up when the pool is exhausted.
Idle timeout: Close connections idle for too long (prevents stale connections).
Max lifetime: Close connections after a maximum age (prevents issues with server-side connection limits, DNS changes).*

Common issues:

Connection leak: Application borrows a connection but never returns it (exception during processing, forgot to close). Pool eventually runs out. Solution: connection wrappers that return on close(), leak detection (alert when a connection is held too long).
Pool exhaustion: All connections are in use. New requests wait or fail. Monitor pool utilization and set alerts at 80% capacity.
Stale connections: A connection is returned to the pool but the server closed it. Validation on borrow (send a test query) detects this.

See our PostgreSQL interview questions (connection pooling) and backend development crash course.

Follow-up questions:

How does PgBouncer differ from application-level connection pooling?
What is the optimal pool size for a PostgreSQL database on a 16-core machine?
How do you handle connection pooling in a serverless (Lambda) environment?

14. Explain the concept of linearizability and its importance in concurrent systems.

What the interviewer is really asking: Do you understand the strongest consistency model and when it's necessary?

Answer framework:

Linearizability is the strongest single-object consistency guarantee. It requires that:

Every operation appears to take effect instantaneously at some point between its invocation and response (the linearization point).
The order of non-overlapping operations is preserved.
All clients see the same order of operations.

In simpler terms: a linearizable system behaves as if there's a single copy of the data, and all operations are atomic and occur in a well-defined order. If operation A completes before operation B starts, then B sees the effects of A.

Why it matters: without linearizability, a system might return stale reads (client writes a value, immediately reads it, gets the old value), show inconsistent views (two clients read different values for the same key at the same time), or allow conflicting operations (two clients both successfully "win" an auction).

Examples requiring linearizability: leader election (at most one leader at a time), distributed locks (at most one lock holder), unique username registration (no duplicates), and CAS operations (the compare-and-swap must be atomic).

Implementation cost: linearizability requires coordination between nodes, which adds latency (network round-trips for consensus) and reduces availability (cannot serve reads during network partitions without risking staleness). This is why many systems offer weaker consistency models by default.

Related concepts: sequential consistency (preserves each client's operation order but different clients may see different orders), causal consistency (preserves causally related operations but allows concurrent operations to be seen in different orders), and eventual consistency (all replicas eventually converge but no ordering guarantees).

See our consistency models, CAP theorem, and distributed systems guide.

Follow-up questions:

What is the difference between linearizability and serializability?
How does Raft consensus protocol achieve linearizable reads?
When is eventual consistency an acceptable alternative to linearizability?

15. How do you design a concurrent task scheduler?

What the interviewer is really asking: Can you apply multiple concurrency concepts together in a real system design?

Answer framework:

A task scheduler manages the execution of tasks across a pool of worker threads, with features like priority, scheduling, dependency management, and resource limits.

Core components:

Task queue: A priority blocking queue where tasks are ordered by priority and scheduled time. Use a min-heap (priority queue) protected by a mutex, or a concurrent skip list for lock-free ordering.
Worker pool: A fixed-size thread pool that pulls tasks from the queue. Each worker runs a loop: dequeue a task, execute it, handle errors, dequeue the next. Java's ThreadPoolExecutor provides this. Go uses a goroutine pool with a channel-based work queue.
Scheduler thread: Manages delayed and periodic tasks. Maintains a delay queue (tasks sorted by next execution time). When the next task's time arrives, moves it to the work queue. Java's ScheduledThreadPoolExecutor implements this.
Dependency management: Some tasks depend on others. Maintain a DAG of task dependencies. A task is eligible for execution only when all its dependencies have completed. When a task completes, check if any dependent tasks are now eligible and enqueue them.
Cancellation: Tasks can be cancelled before or during execution. Use a cancellation token (CancellationToken in C#, Context in Go) that workers check periodically. Cancelled tasks are removed from the queue or interrupted during execution.
Resource management: Limit concurrent tasks per resource type (e.g., at most 10 concurrent database queries, at most 5 concurrent API calls). Use semaphores per resource type.

Concurrency considerations:

Thread pool sizing: for CPU-bound tasks, use numCPUs workers. For I/O-bound tasks, use more workers (2x-10x CPUs). For mixed workloads, use separate pools.
Work stealing: idle workers steal tasks from busy workers' local queues. Java's ForkJoinPool uses this for better load distribution.
Graceful shutdown: stop accepting new tasks, wait for in-progress tasks to complete (with a timeout), then terminate workers.

See our system design interview guide and distributed systems guide.

Follow-up questions:

How would you extend this to a distributed task scheduler across multiple machines?
How do you handle a worker thread that hangs on a task?
What is work stealing and when does it improve performance?

Common Mistakes in Concurrency Interviews

Using locks everywhere — Locks are not the only tool. Show knowledge of lock-free structures, channels, immutability, and the actor model.
Ignoring memory visibility — Adding a lock solves mutual exclusion but candidates forget about visibility guarantees. Demonstrate understanding of happens-before relationships.
Not considering contention — A solution with a single global lock is correct but may not scale. Discuss how to reduce contention with fine-grained locking, lock-free structures, or partitioning.
Confusing concurrency with parallelism — Precise vocabulary matters at the senior level. Use these terms correctly.
Forgetting about error handling — What happens when a concurrent operation fails? How do you ensure consistency? Discuss cleanup, rollback, and recovery.
Over-engineering — Sometimes a simple mutex is the right answer. Don't reach for lock-free structures when a mutex with microsecond contention is perfectly adequate.

How to Prepare

Week 1: Review concurrency primitives in your primary language (mutexes, atomics, channels, thread pools). Implement a thread-safe cache and a producer-consumer queue.

Week 2: Study lock-free programming. Implement a lock-free stack using CAS. Understand the ABA problem and memory ordering.

Week 3: Practice concurrent system design. Design a rate limiter, connection pool, and task scheduler. Focus on correctness under concurrency.

Week 4: Study real-world concurrency bugs and their solutions. Read about database concurrency (MVCC, isolation levels) and distributed concurrency (consensus algorithms, distributed locks).

For comprehensive preparation, see our system design interview guide and explore the learning paths. Check our pricing plans for full access.

Concurrency Interview Questions for Senior Engineers (2026)

Why Concurrency Matters in Senior Engineering Interviews

1. Explain the difference between concurrency and parallelism.

2. What is a race condition, and how do you prevent it?

3. Explain deadlocks and the strategies to prevent them.

4. Describe lock-free and wait-free data structures.

5. How does the Java Memory Model (or Go's/C++'s memory model) affect concurrent programming?

6. Compare async/await, threads, and green threads/goroutines.

7. How do you design a thread-safe cache?

8. Explain the producer-consumer pattern and its implementations.

9. How do you handle concurrency in database access?

10. Explain the actor model and when to use it.

11. How do you implement rate limiting in a concurrent system?

12. How do you debug concurrency bugs in production?

13. How does connection pooling work, and what concurrency issues does it solve?

14. Explain the concept of linearizability and its importance in concurrent systems.

15. How do you design a concurrent task scheduler?

Common Mistakes in Concurrency Interviews

How to Prepare

Related Resources

Master this topic in our 12-week cohort