INTERVIEW_QUESTIONS
Backend Development Interview Questions for Senior Engineers (2026)
Top backend development interview questions with detailed answer frameworks covering server architecture, API design patterns, database interactions, caching strategies, concurrency models, error handling, and performance optimization for senior and staff-level roles.
Why Backend Development Expertise Matters in Senior Engineering Interviews
Backend development sits at the core of every production system. When companies interview senior and staff engineers, they are not looking for someone who can stand up a REST endpoint. They are looking for engineers who understand how server architecture decisions cascade through the entire stack, how API contracts shape the developer experience for years, and how database interaction patterns determine whether a system can scale from a thousand users to ten million without a rewrite.
Backend interviews at the senior level probe for judgment. Interviewers want to hear how you reason about trade-offs between consistency and availability in your data layer, why you chose a particular concurrency model for a high-throughput service, or how you designed error handling that keeps a distributed system resilient when individual components fail. These are the decisions that separate senior engineers from mid-level developers who can write working code but have never had to keep it running under real production pressure.
The questions in this guide are drawn from real interviews at top-tier technology companies. Each question includes what the interviewer is actually evaluating, a structured answer framework you can adapt, and code examples where they strengthen the response. Use these alongside the system design interview guide and related concept deep-dives to build a complete preparation strategy.
Question 1: How do you decide between a monolithic architecture and microservices for a new backend system?
What the interviewer is really asking: They want to see that you do not reflexively choose microservices because they are trendy. They are testing whether you understand the operational cost of distributed systems and can match architecture to team size, domain complexity, and growth trajectory.
Answer framework:
Start by establishing that this is a context-dependent decision, not a binary choice. A monolith is the right starting point for most new systems because it minimizes operational complexity, simplifies debugging, and allows rapid iteration when the domain model is still being discovered.
Explain the specific inflection points where microservices become justified: when independent teams need to deploy at different cadences, when specific components have dramatically different scaling requirements, or when a bounded context has stabilized enough that the service boundary is unlikely to shift. Reference the distributed systems principle that network boundaries introduce failure modes that do not exist in a monolith.
Describe the middle ground. A well-structured monolith with clear module boundaries (sometimes called a "modular monolith") gives you most of the organizational benefits of microservices without the operational overhead. You can extract services later when you have concrete evidence that extraction will solve a real problem.
Address the operational requirements that microservices demand: service discovery, distributed tracing, circuit breakers, contract testing between services, and a deployment pipeline mature enough to handle dozens of independent deployments per day. If the team does not have these capabilities, microservices will slow them down rather than speed them up.
Question 2: Walk me through how you design a RESTful API for a complex domain. What principles guide your decisions?
What the interviewer is really asking: They are testing your understanding of API design as a long-term contract. They want to hear about versioning strategy, resource modeling, error responses, and how you balance REST purity with practical developer experience.
Answer framework:
Begin with resource identification. Good API design starts by modeling the domain as resources, not by mirroring database tables or backend procedures. Explain that you work with domain experts to identify the nouns that external consumers care about, which often differ from internal data models.
Cover URL structure and naming conventions. Resources should be plural nouns. Relationships should be expressed through nesting only when there is a strong parent-child relationship. Avoid deeply nested URLs (more than two levels) because they create rigid coupling.
Discuss versioning strategy. Explain the trade-offs between URL versioning (/v2/users), header versioning (Accept: application/vnd.api+json;version=2), and query parameter versioning. URL versioning is the most explicit and works best when breaking changes are infrequent. Header versioning is cleaner but harder to test in a browser.
Address error handling. A well-designed API returns consistent error responses with machine-readable error codes, human-readable messages, and enough context for the client to recover. Reference how this connects to broader error handling patterns.
Finish with pagination, filtering, and rate limiting. Cursor-based pagination is more reliable than offset-based for large datasets. Filtering should use query parameters with clear naming conventions. Rate limiting should return 429 status codes with Retry-After headers.
For deeper patterns, see the API design comparison guide.
Question 3: How do you handle database connection pooling in a high-throughput backend service?
What the interviewer is really asking: They are testing whether you understand the resource constraints that databases impose, how connection management affects latency and throughput, and whether you have dealt with connection exhaustion in production.
Answer framework:
Start by explaining why connection pooling matters. Database connections are expensive to establish (TCP handshake, TLS negotiation, authentication). Without pooling, a service handling 1,000 concurrent requests would need 1,000 database connections, which exceeds the limits of most database servers.
Describe pool sizing. The optimal pool size is not "as large as possible." It is constrained by the database server's capacity, the query execution time, and the number of application instances. A common formula is: pool_size = (core_count * 2) + effective_spindle_count for CPU-bound workloads, but you should always benchmark with realistic traffic patterns.*
Discuss connection lifecycle management: validation on checkout (is the connection still alive?), maximum lifetime (connections should be recycled periodically to handle DNS changes and server-side timeouts), and idle timeout (release connections that have not been used recently).
Address what happens when the pool is exhausted. The application should not block indefinitely waiting for a connection. Set a checkout timeout and return a clear error to the caller. Instrument the pool so you can monitor checkout wait times, active connections, and pool exhaustion events.
Question 4: Explain your approach to caching in backend systems. When do you cache, and what invalidation strategies do you use?
What the interviewer is really asking: Caching is deceptively simple to implement and notoriously difficult to get right. They want to see that you understand cache invalidation as the hard problem, not the cache hit path.
Answer framework:
Establish the caching hierarchy. Backend caching operates at multiple levels: application-level in-memory caches (fastest, limited by process memory), distributed caches like Redis or Memcached (shared across instances, adds network latency), CDN caches (for static and semi-static content), and database query caches (generally a last resort). Each level has different consistency and latency characteristics.
Explain when caching is appropriate. Cache data that is read frequently, expensive to compute, and tolerant of staleness. Data that is written as often as it is read, or data that must always be perfectly fresh, is a poor caching candidate.
Discuss invalidation strategies in depth. Time-based expiration (TTL) is the simplest and works well for data where a few seconds or minutes of staleness is acceptable. Write-through caching updates the cache on every write, giving strong consistency but adding write latency. Event-driven invalidation uses change events from the database or a message queue to invalidate cache entries when the underlying data changes. This is the most complex but gives the best balance of freshness and performance.
Address cache stampede. When a popular cache entry expires, hundreds of requests may simultaneously hit the database to rebuild it. Solutions include probabilistic early expiration, lock-based rebuilding (only one request rebuilds while others wait), and background refresh.
For a detailed comparison of caching technologies, see Redis vs Memcached.
Question 5: How do you handle concurrency in backend services? Compare different concurrency models and when you would choose each.
What the interviewer is really asking: They want to know if you understand the fundamental differences between threads, processes, async I/O, and actor models, and whether you can match the right concurrency model to the workload characteristics.
Answer framework:
Start by distinguishing CPU-bound and I/O-bound workloads, because the optimal concurrency model differs dramatically between them.
For I/O-bound workloads (most web services), async I/O is typically the best fit. A single thread can handle thousands of concurrent connections because most of the time is spent waiting for network responses, database queries, or file reads. Explain the event loop model used by Node.js, Python's asyncio, and Go's goroutines (which abstract over OS threads with a cooperative scheduler).
For CPU-bound workloads (data processing, image manipulation, ML inference), multi-processing or threading with native threads is necessary because async I/O does not help when the bottleneck is CPU computation. In languages with a GIL like Python, multi-processing is required for true parallelism.
Discuss the actor model (Erlang/Elixir, Akka) as a concurrency abstraction that eliminates shared mutable state. Each actor has its own state and communicates through message passing. This model is excellent for systems with many independent entities that occasionally interact, like chat servers or IoT device management.
Address thread safety and shared state. When multiple threads or coroutines access shared data, you need synchronization primitives. Cover mutexes, read-write locks, and lock-free data structures. Emphasize that the best solution is usually to avoid shared mutable state entirely through immutable data structures or message passing.
See also: concurrency patterns and Go vs Rust for backend services.
Question 6: Describe how you would design a robust error handling strategy for a distributed backend system.
What the interviewer is really asking: They are testing whether you think about errors as a first-class architectural concern rather than an afterthought. They want to see that you understand the difference between transient and permanent failures, and how error handling interacts with observability.
Answer framework:
Start with error classification. Not all errors are equal, and your handling strategy should differ based on the type. Transient errors (network timeouts, temporary unavailability) should be retried. Permanent errors (invalid input, resource not found) should be reported immediately. Partial failures (some items in a batch succeeded, others failed) need the most careful handling.
Discuss retry strategies. Exponential backoff with jitter is the standard approach for transient errors. Explain why jitter matters: without it, all clients retry at the same time, creating a synchronized thundering herd. Set maximum retry counts and total timeout budgets to prevent a single slow dependency from consuming all your resources.
Cover circuit breaker patterns. When a downstream service is consistently failing, continuing to send requests wastes resources and can cascade the failure. A circuit breaker tracks failure rates and "opens" when they exceed a threshold, failing fast for subsequent requests. After a cooldown period, it allows a few probe requests through to test if the service has recovered.
Address error propagation across service boundaries. Each service should translate internal errors into meaningful responses for its callers. A database constraint violation should not bubble up as a raw SQL error to an API consumer. Map internal errors to domain-specific error codes.
Related reading: distributed systems error handling and the system design interview guide.
Question 7: How do you approach database schema design for a backend system that needs to evolve over time?
What the interviewer is really asking: They want to know if you can design schemas that support backward-compatible migrations, whether you understand the operational challenges of running migrations on large tables, and how you handle schema changes in a zero-downtime deployment environment.
Answer framework:
Start with the principle of evolutionary design. The schema you launch with will not be the schema you have in a year. Every design decision should be evaluated not just for current requirements but for how easily it can be changed.
Discuss migration strategies for large tables. Adding a column to a table with a billion rows can lock the table for hours on some databases. Explain online DDL tools like pt-online-schema-change for MySQL or pg_repack for PostgreSQL. Describe the expand-contract pattern: first add the new column (expand), deploy code that writes to both old and new columns, backfill existing data, then remove the old column (contract).
Address backward compatibility. In a zero-downtime deployment, old and new versions of your application run simultaneously during rollout. Schema changes must be compatible with both versions. This means you never rename or remove a column in the same deployment that changes the code using it. These must be separate, sequential deployments.
Cover indexing strategy. Explain that indexes should be driven by query patterns, not added speculatively. Every index speeds up reads but slows down writes. Composite indexes should be ordered by selectivity (most selective column first for equality comparisons, range columns last). Partial indexes and expression indexes can be powerful tools for specific query patterns.
Question 8: How do you design backend APIs to handle partial failures gracefully?
What the interviewer is really asking: They are testing whether you have built systems where a single API call depends on multiple downstream services, and whether you know how to provide a useful response when some of those services fail.
Answer framework:
Define what partial failure means in practice. A product detail page might need data from the catalog service, pricing service, recommendations service, and reviews service. If the recommendations service is down, you should still return the product page with catalog, pricing, and review data, not a 500 error.
Discuss the response envelope pattern. Design your API responses so that each section of data includes its own status indicator. The client can then decide how to handle missing sections rather than treating the entire response as failed.
Cover fallback strategies. When a non-critical dependency fails, you can return cached data (possibly stale), default values, or omit that section entirely. The key is classifying dependencies as critical (the request cannot succeed without them) versus degraded (the request can succeed with reduced functionality).
Address timeout budgets. When an API call fans out to multiple services, you need a total timeout budget that is divided among the calls. If the first call takes 80% of the budget, the remaining calls get shorter timeouts. This prevents a single slow service from causing the entire request to timeout.
Question 9: How would you implement rate limiting for a backend API? What algorithms would you consider?
What the interviewer is really asking: They are testing whether you understand rate limiting beyond a simple counter. They want to hear about different algorithms, distributed rate limiting challenges, and how you balance protection with user experience.
Answer framework:
Start with why rate limiting matters: protecting backend resources from abuse, ensuring fair usage across clients, and preventing cascade failures during traffic spikes.
Compare the major algorithms. Fixed window counting is simple but has the burst problem at window boundaries (a client can make 2x the limit by timing requests across the boundary). Sliding window log is precise but memory-intensive because it stores every request timestamp. Sliding window counter approximates the sliding window with much less memory by weighting the current and previous windows. Token bucket allows controlled bursting while maintaining a long-term rate limit, and is the most commonly used in production.
Discuss distributed rate limiting. When your API runs on multiple instances, each instance cannot maintain its own counter. You need a shared store like Redis. Explain the trade-off between accuracy and latency: checking Redis on every request adds latency, but local counters with periodic sync to Redis can allow brief overages.
Cover client identification. Rate limits can be applied per API key, per IP address, per user, or per endpoint. Different endpoints may need different limits (a search endpoint might allow 10 requests per second while a data export endpoint might allow 1 per minute).
For more on protecting backend systems, see rate limiting strategies and the API gateway comparison.
Question 10: How do you approach performance optimization in a backend system? Walk me through your methodology.
What the interviewer is really asking: They want to see a systematic approach to performance work, not ad hoc guessing. They are testing whether you measure before optimizing, whether you understand where bottlenecks typically occur, and whether you consider the trade-offs of optimization.
Answer framework:
Start with measurement. The cardinal rule of performance optimization is to measure first. Profile the system under realistic load to identify actual bottlenecks. The bottleneck is almost never where you think it is. Tools include application performance monitoring (APM) like Datadog or New Relic, profilers (cProfile, pprof, async-profiler), and distributed tracing.
Describe the optimization hierarchy. In order of typical impact: algorithm and data structure improvements (O(n) vs O(n^2)), I/O reduction (fewer database queries, batching, caching), concurrency improvements (parallel execution of independent I/O), and finally micro-optimizations (which rarely matter in backend systems).
Discuss the N+1 query problem as the single most common backend performance issue. When fetching a list of items and then making a separate database query for each item's related data, the number of queries grows linearly with the result set. The fix is eager loading or batching related data queries.
Cover load testing. Performance optimization is meaningless without load testing to validate the improvement. Explain how you use tools like k6, Locust, or Gatling to simulate realistic traffic patterns, and how you establish performance budgets (p50 latency under 100ms, p99 under 500ms).
Question 11: How do you design a backend system for idempotency? Why does it matter?
What the interviewer is really asking: They are testing whether you understand that in distributed systems, exactly-once delivery is practically impossible, and that idempotency is the pragmatic solution. They want to see that you have dealt with duplicate requests, retries, and at-least-once delivery semantics.
Answer framework:
Define idempotency clearly. An idempotent operation produces the same result whether it is executed once or multiple times. GET, PUT, and DELETE are naturally idempotent in REST semantics. POST is not, which is why payment processing and order creation need explicit idempotency mechanisms.
Explain why idempotency matters in practice. Network failures, client retries, message queue redelivery, and load balancer retries all create scenarios where the same request is processed multiple times. Without idempotency, this leads to duplicate charges, duplicate orders, or inconsistent state.
Describe the idempotency key pattern. The client generates a unique key (typically a UUID) for each logical operation and includes it in the request header. The server stores the result associated with that key and returns the stored result for any subsequent requests with the same key.
Discuss implementation details: where to store idempotency keys (database with TTL, Redis), how long to retain them, race conditions when two identical requests arrive simultaneously (use database constraints or distributed locks), and what to return for in-progress operations.
Related: idempotency in distributed systems and payment system design.
Question 12: How do you implement authentication and authorization in a backend API? What are the trade-offs between different approaches?
What the interviewer is really asking: They are testing whether you understand the difference between authentication (who are you) and authorization (what can you do), and whether you can design a system that is both secure and performant.
Answer framework:
Start by clearly separating authentication from authorization. Authentication verifies identity. Authorization determines permissions. Many systems conflate these, leading to security gaps and inflexible permission models.
Compare authentication approaches. Session-based authentication stores state on the server and is simpler to revoke but harder to scale across multiple services. JWT (JSON Web Tokens) are stateless and work well in microservice architectures but cannot be revoked without additional infrastructure (blacklists or short expiration with refresh tokens). OAuth 2.0 with OpenID Connect is the standard for third-party authentication and provides a battle-tested framework.
Discuss authorization models. Role-based access control (RBAC) assigns permissions to roles, and users are assigned roles. This is simple but becomes unwieldy with complex permission requirements. Attribute-based access control (ABAC) makes decisions based on attributes of the user, resource, and environment, offering fine-grained control at the cost of complexity. Relationship-based access control (ReBAC), as implemented by Google's Zanzibar, models permissions as relationships between entities and is excellent for systems where permissions are derived from data ownership.
Address security considerations: token storage (HttpOnly cookies for web, secure storage for mobile), refresh token rotation, CSRF protection, and rate limiting on authentication endpoints to prevent brute force attacks.
Question 13: How do you design a backend system to handle background job processing reliably?
What the interviewer is really asking: They want to see that you understand the challenges of asynchronous processing: at-least-once delivery, job deduplication, priority handling, dead letter queues, and monitoring.
Answer framework:
Explain when to use background jobs versus synchronous processing. Any operation that takes more than a few hundred milliseconds, is not needed for the immediate response, or can fail independently should be a background job. Examples: sending emails, generating reports, processing uploaded files, syncing data with external systems.
Discuss queue selection. Redis-backed queues (Sidekiq, Bull, Celery with Redis) are fast and simple but can lose data if Redis is not configured for persistence. Dedicated message brokers (RabbitMQ, Kafka, SQS) provide stronger durability guarantees. Kafka is ideal when you need to replay events or when multiple consumers need to process the same events independently.
Cover reliability patterns. Jobs must be idempotent because at-least-once delivery means they may execute more than once. Implement dead letter queues for jobs that fail repeatedly so they can be investigated without blocking other jobs. Use job-specific timeouts to prevent stuck jobs from consuming workers indefinitely.
Address monitoring and observability. Track queue depth (growing queues indicate processing is falling behind), job latency (time from enqueue to completion), failure rates, and retry rates. Set alerts on queue depth thresholds to catch problems before they become user-visible.
See message queue comparisons and event-driven architecture.
Question 14: How do you handle data consistency in a system that uses both a primary database and a cache?
What the interviewer is really asking: This is a deeper version of the caching question. They want to hear about specific consistency patterns, the consequences of getting them wrong, and how you detect and recover from inconsistencies.
Answer framework:
Frame the fundamental tension. A cache is a copy of data. Any time you have two copies of data, they can diverge. The question is not whether they will diverge but how you minimize the window of inconsistency and what happens when it occurs.
Compare the three main patterns. Cache-aside (lazy loading) is the most common: the application checks the cache first, falls back to the database, and populates the cache on miss. Write-through updates the cache synchronously on every write, providing strong consistency at the cost of write latency. Write-behind (write-back) buffers writes in the cache and flushes to the database asynchronously, providing the best write performance but risking data loss if the cache fails before flushing.
Discuss the double-write problem. When your code updates the database and then invalidates the cache, there is a window between these two operations where another request could read stale data from the cache or repopulate the cache with stale data from a replica that has not yet received the write. Solutions include using database change data capture (CDC) to drive cache invalidation, or accepting brief inconsistency with short TTLs.
Address cache warming. When deploying a new version of your application or after a cache failure, the cache is cold and every request hits the database. This can overwhelm the database. Strategies include gradual rollout, pre-warming the cache from a database snapshot, and request coalescing (multiple requests for the same key are collapsed into a single database query).
Question 15: How do you structure logging, metrics, and tracing in a backend system for effective observability?
What the interviewer is really asking: They are testing whether you treat observability as an architectural concern that is designed into the system from the start, not bolted on after something breaks in production.
Answer framework:
Define the three pillars of observability. Logs provide detailed records of discrete events. Metrics provide aggregate measurements over time (request rate, error rate, latency distributions). Traces show the path of a single request across multiple services. Each pillar answers different questions, and you need all three for effective debugging.
Discuss structured logging. Unstructured log messages (logger.info("User logged in")) are nearly useless at scale because they cannot be queried or aggregated. Structured logs (logger.info("user.login", extra={"user_id": "123", "method": "oauth", "ip": "..."})) can be indexed and queried in tools like Elasticsearch or Loki.
Cover the RED method for service metrics: Rate (requests per second), Errors (error rate as a percentage), and Duration (latency distribution). Every service should expose these three metrics as a minimum. Use histograms rather than averages for latency because averages hide outliers.
Explain distributed tracing. In a microservice architecture, a single user request might touch ten services. Without tracing, debugging a slow request requires correlating logs across all ten services manually. Distributed tracing (OpenTelemetry, Jaeger, Zipkin) assigns a trace ID to each request and propagates it across service boundaries, allowing you to see the entire request timeline.
For more on observability, see monitoring and alerting and the system design interview guide.
How to Practice
Backend development interviews reward depth over breadth. Here is a structured approach to preparation:
-
Build a complete backend service from scratch. Choose a project with real complexity: a payment processing API, a notification delivery system, or an event-driven order pipeline. Deploy it, load test it, and break it deliberately. You will learn more from debugging a production-like failure than from reading ten architecture blog posts.
-
Study real-world outage reports. Companies like Cloudflare, GitHub, and Stripe publish detailed postmortems. These teach you what actually goes wrong in backend systems and how experienced teams diagnose and recover. Pay attention to the cascading failures, because those are the patterns interviewers ask about.
-
Practice explaining your decisions. Backend interviews are not coding contests. They are conversations about trade-offs. For every architectural decision you make, practice articulating what you considered, what you rejected, and why. Record yourself explaining a system design and listen for gaps in your reasoning.
-
Use algoroq's interview practice modules to simulate real interview scenarios with immediate feedback on your architectural reasoning and communication clarity.
-
Read system design case studies. Understanding how companies like Uber, Stripe, and Netflix solve backend challenges gives you concrete examples to reference in interviews. See the system design interview guide for structured preparation.
Common Mistakes to Avoid
-
Jumping to microservices without justification. Interviewers will probe why you chose microservices. If your answer is "because that is how Netflix does it" rather than a specific technical or organizational reason, you will lose credibility. Start with a monolith and explain when you would extract services.
-
Ignoring operational concerns. A backend design is incomplete without deployment strategy, monitoring, and failure recovery. If you design a beautiful architecture but cannot explain how you would detect and recover from a database failover at 3 AM, the interviewer will doubt your production experience.
-
Over-optimizing prematurely. Adding caching layers, read replicas, and message queues to a system that serves 100 requests per second signals that you do not understand cost-benefit analysis. Start simple and explain the specific metrics that would trigger each optimization.
-
Treating databases as black boxes. Senior backend engineers need to understand query execution plans, indexing strategies, and replication lag. If you cannot explain why a query is slow or how adding an index changes the execution plan, study database internals more deeply. See database internals for a deep dive.
-
Neglecting error handling in system designs. Many candidates describe the happy path in detail but wave their hands at error scenarios. Interviewers specifically look for how you handle network timeouts, partial failures, data inconsistencies, and resource exhaustion. Make error handling a first-class part of your design.
-
Forgetting about backward compatibility. Every API change, database migration, and configuration update must be backward compatible in a zero-downtime deployment environment. If your design requires "just restart all the servers at once," it will not work at any company operating at scale.
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.