INTERVIEW_QUESTIONS

API Design Interview Questions for Senior Engineers (2026)

Top API design interview questions with detailed answer frameworks covering REST principles, versioning, pagination, error handling, and API security for senior engineering interviews.

20 min readUpdated Apr 19, 2026
interview-questionsapi-designsystem-designsenior-engineer

Why API Design Matters in Senior Engineering Interviews

APIs are the contracts between services, teams, and organizations. A well-designed API accelerates development, reduces bugs, and enables independent team velocity. A poorly designed API creates confusion, breaks clients, and accumulates technical debt that is extremely costly to fix once external consumers depend on it.

Senior engineers are expected to design APIs that are intuitive, consistent, evolvable, and performant. In interviews, API design questions test your ability to think about the consumer experience, backward compatibility, error handling, and the operational aspects of running an API at scale. Companies like Google, Stripe, and Twilio are known for excellent API design, and their interview processes reflect this emphasis.

This guide covers the most common API design interview questions with structured frameworks. For related preparation, see our REST API deep dive, the system design interview guide, and our learning paths.

1. How do you design a RESTful API for a resource-oriented service?

What the interviewer is really asking: Do you understand REST principles deeply enough to apply them consistently, and do you know when to deviate?

Answer framework:

Start with resource identification. Every entity in the system should be a resource with a unique URI. Use nouns, not verbs: /users not /getUsers, /orders/123/items not /getOrderItems. Use plural nouns consistently: /users/123 not /user/123.

Map HTTP methods to CRUD operations: GET (read, idempotent, cacheable), POST (create), PUT (full update, idempotent), PATCH (partial update), DELETE (remove, idempotent). Explain idempotency: calling PUT with the same body multiple times produces the same result. POST is not idempotent because each call creates a new resource.

Design the URL hierarchy to reflect resource relationships: /users/123/orders (orders belonging to user 123), /orders/456/items (items in order 456). Limit nesting to 2-3 levels. Deeper nesting suggests the resources are too tightly coupled.

Discuss response codes: 200 (OK), 201 (Created, with Location header), 204 (No Content, for DELETE), 400 (Bad Request, client error), 401 (Unauthorized, not authenticated), 403 (Forbidden, not authorized), 404 (Not Found), 409 (Conflict, e.g., duplicate), 429 (Too Many Requests), 500 (Internal Server Error). Using the correct status code is essential for API usability.

For pagination, discuss cursor-based vs offset-based. Offset-based (page=3&limit=20) is simple but breaks when items are inserted or deleted. Cursor-based (cursor=abc123&limit=20) is stable and more performant for large datasets. Return next and previous cursor values in the response.

For filtering and sorting: use query parameters (/users?status=active&sort=created_at&order=desc). Support multiple filters with logical AND. For complex queries, consider a search endpoint with a POST body.

Follow-up questions:

  • When would you use a non-RESTful RPC-style API instead of REST?
  • How do you handle actions that do not map cleanly to CRUD (e.g., sending an email)?
  • How would you design an API for batch operations?

2. How do you handle API versioning?

What the interviewer is really asking: Do you understand the practical challenges of evolving an API without breaking existing clients?

Answer framework:

Discuss the three main approaches with trade-offs.

URL versioning (/v1/users, /v2/users): most common and most visible. Easy for clients to understand and for routing infrastructure to handle. Used by Google, Twitter, and Facebook. Disadvantage: version is in the URL, which is a resource identifier, not a representation. Philosophically impure but practically excellent.

Header versioning (Accept: application/vnd.myapi.v2+json): cleaner semantically since the URL identifies the resource and the header specifies the representation. Used by GitHub (Accept: application/vnd.github.v3+json). Disadvantage: harder to test (cannot just change the URL in a browser), harder to cache (varies on headers).

Query parameter versioning (/users?version=2): similar to URL versioning but keeps the path clean. Disadvantage: easy to forget the parameter, default version behavior can be confusing.

Recommendation: use URL versioning for major versions (breaking changes) and make non-breaking changes within a version. Define what constitutes a breaking change: removing a field, changing a field type, changing URL structure, changing error formats. Non-breaking: adding a field, adding an optional parameter, adding a new endpoint.

Discuss version lifecycle: announce deprecation timeline well in advance (minimum 6-12 months for external APIs). Monitor usage of deprecated versions. Provide migration guides. Use sunset headers (Sunset: Sat, 31 Dec 2026 23:59:59 GMT) to programmatically notify clients.

Discuss the Stripe approach: version by date (2026-04-19). Every API request can specify a version date. The server transforms the response to match the requested version. This allows fine-grained evolution without version number jumps. Internally, Stripe uses a chain of version transformers.

For internal APIs between microservices, versioning is less formal. Use backward-compatible changes (additive only) and the tolerant reader pattern (ignore unknown fields). Proto3 in gRPC supports this natively.

Follow-up questions:

  • How would you migrate millions of API clients from v1 to v2?
  • What is the Stripe approach to API versioning and why is it effective?
  • How do you handle versioning in event-driven APIs?

3. Design a rate limiting strategy for a public API

What the interviewer is really asking: Can you protect your API from abuse while providing a good experience for legitimate users?

Answer framework:

Discuss the algorithms: token bucket (smooth rate, allows bursts up to bucket size), sliding window counter (accurate, moderate memory), fixed window counter (simple, but 2x burst at window boundaries), and leaky bucket (smooth output rate, no bursts). Token bucket is the industry standard, used by AWS, Stripe, and most API gateways.

For a multi-tier rate limiting strategy: per-user limits (1000 requests/minute for free tier, 10000 for paid), per-endpoint limits (writes more restricted than reads), global limits (protect the entire service regardless of individual limits), and IP-based limits (catch unauthenticated abuse).

Implementation: use Redis for distributed rate limiting counters. For the token bucket: store last_refill_time and tokens_remaining per user. On each request: calculate tokens to add since last refill, add them (up to max), consume one token. If tokens_remaining <= 0, reject with 429.

Return rate limit headers so clients can self-regulate: X-RateLimit-Limit (total allowed), X-RateLimit-Remaining (remaining in current window), X-RateLimit-Reset (when the window resets, as Unix timestamp), Retry-After (seconds to wait, included with 429 responses).

Discuss the client experience: provide clear documentation of rate limits, return informative error messages with the 429 response, offer higher limits through paid tiers, and consider a grace period for small overages.

For distributed rate limiting across multiple API servers: centralized (all servers check Redis, adds latency), local with periodic sync (each server tracks locally and syncs periodically, less precise but faster), or a hybrid (local check with async Redis sync).

Discuss adaptive rate limiting: during system stress, dynamically lower rate limits. Prioritize authenticated users over anonymous, and paid users over free. Use load balancing in conjunction with rate limiting.

Follow-up questions:

  • How do you handle rate limiting for WebSocket connections?
  • How would you implement fair rate limiting for a multi-tenant API?
  • What is the difference between rate limiting and throttling?

4. How do you design error responses for an API?

What the interviewer is really asking: Do you think about the developer experience holistically, including when things go wrong?

Answer framework:

A good error response must be machine-parseable (consistent structure), human-readable (clear message), actionable (what to do about it), and traceable (correlation ID for debugging).

Design a consistent error envelope. Every error response should have the same structure regardless of the error type. A well-designed error response includes: an error code (machine-readable, stable identifier like INSUFFICIENT_FUNDS), a message (human-readable description), a details array (field-level errors for validation), a request_id (for tracing), and a documentation URL (link to docs about this error).

Map errors to HTTP status codes correctly: 400 for invalid input (include field-level details), 401 for missing or invalid authentication, 403 for valid auth but insufficient permissions, 404 for resource not found (but consider returning 403 for security when revealing existence is a risk), 409 for conflicts (duplicate creation, version mismatch), 422 for semantically invalid input (syntactically correct but business rule violation), 429 for rate limiting, and 500 for unexpected server errors (never leak internal details).

Discuss error categories: validation errors (list each invalid field with the specific issue), business logic errors (explain what rule was violated and what the user can do), authentication/authorization errors (distinguish between not authenticated and not authorized), and transient errors (signal that a retry might succeed using Retry-After header).

For API design in practice: use a global error code registry that all services share. Document every error code with examples. Include the error code in logs for correlation. Version your error format as part of your API versioning strategy.

Discuss the security perspective: never include stack traces, database queries, or internal service names in error responses. For 500 errors, return a generic message with a request_id that the user can share with support.

Follow-up questions:

  • How do you handle errors in GraphQL where everything returns 200?
  • How do you localize error messages for international users?
  • How do you handle partial failures in batch API requests?

5. Design an authentication and authorization system for an API

What the interviewer is really asking: Do you understand the difference between authentication and authorization, and can you design a secure, scalable auth system?

Answer framework:

Authentication (who are you) and authorization (what can you do) are separate concerns that should be implemented independently.

For authentication, discuss the common approaches. API keys: simple, good for server-to-server. Include in a header (X-API-Key or Authorization: Bearer). Hash and store keys, never log them. Rotate regularly. OAuth 2.0 with JWT: the standard for user-facing APIs. The authorization server issues access tokens (short-lived, 15-60 minutes) and refresh tokens (long-lived). JWTs contain claims (user ID, scopes, expiration) and are signed. Stateless validation means the API server can verify the token without calling the auth server.

For authorization, discuss RBAC (Role-Based Access Control): users have roles (admin, editor, viewer), roles have permissions. Simple and sufficient for most applications. ABAC (Attribute-Based Access Control): policies based on attributes of the user, resource, and environment. More flexible but more complex. Example: a user can edit documents only in their department and only during business hours.

For the architecture: use an API gateway for authentication (validate tokens, extract user identity) and push authorization to individual services (each service knows its own permission model). Use a shared authorization library for consistency.

Discuss token security: store refresh tokens securely (encrypted, HttpOnly cookies for web, secure storage for mobile), use short expiration for access tokens, implement token revocation (a revocation list or short enough TTL that revocation is not needed), and use PKCE for public clients (mobile, SPA).

Discuss rate limiting integration: rate limits should be per-authenticated-user for authenticated requests and per-IP for unauthenticated requests.

For microservices: service-to-service auth using mutual TLS (mTLS) or service tokens. Propagate user context through request headers so downstream services can make authorization decisions.

Follow-up questions:

  • How do you handle token revocation in a distributed system?
  • How would you implement API key rotation without downtime?
  • How do you secure webhooks (API sending data to the client)?

6. How do you design APIs for backward compatibility?

What the interviewer is really asking: Can you evolve an API over years without breaking existing integrations?

Answer framework:

The cardinal rule: only make additive changes within a version. Adding fields, endpoints, and optional parameters is safe. Removing fields, changing types, or changing semantics is breaking.

Use the tolerant reader pattern: clients should ignore unknown fields. Servers should accept missing optional fields with defaults. This allows both sides to evolve independently. In protocol buffers (gRPC), this is built into the format. In JSON APIs, document this expectation clearly.

Field deprecation: do not remove fields immediately. Mark them as deprecated in documentation, stop populating them (return null or a default), and remove them only in a new major version. Use custom response headers (Deprecated: true) to warn clients programmatically.

For request compatibility: new required fields are breaking. Make new fields optional with sensible defaults. If a field must be required, introduce it as optional first, give clients time to adopt, then enforce in a new version.

For response compatibility: adding new fields is safe (if clients use the tolerant reader pattern). Changing a field from a scalar to a list, or changing its type, is breaking. Renaming a field is breaking. If you need a different structure, add new fields alongside old ones.

Discuss the expand pattern for evolving responses: return a minimal response by default, allow clients to request additional data via an expand parameter (/users/123?expand=orders,address). This lets you add new expandable relationships without changing the default response.

For database design alignment: your internal data model will evolve faster than your API. Use a separate API model (DTO) layer that translates between internal and external representations. This decouples API compatibility from database schema changes.

Contract testing: use tools like Pact to verify that your API changes do not break existing consumers. Run these tests in CI/CD to catch breaking changes before deployment.

Follow-up questions:

  • How do you handle a breaking change that is absolutely necessary?
  • How would you migrate a field from one type to another without breaking clients?
  • What is the role of API gateways in managing backward compatibility?

7. Design a webhook system for notifying external consumers of events

What the interviewer is really asking: Can you design a reliable notification system that handles the challenges of delivering events to unreliable external endpoints?

Answer framework:

A webhook system allows consumers to register callback URLs and receive HTTP POST requests when events occur. This is the inverse of polling and is far more efficient.

For the registration API: consumers register webhooks with a target URL, the event types they are interested in, and an optional secret for signature verification. Validate the URL on registration (send a verification request that the consumer must acknowledge).

For delivery architecture: when an event occurs, the service publishes it to a message queue like Kafka. A webhook delivery service consumes events, looks up registered webhooks, and sends HTTP POST requests. Use a separate delivery queue per consumer to prevent one slow consumer from blocking others.

For reliability: implement retry with exponential backoff (1s, 2s, 4s, 8s up to a maximum of 24 hours). After exhausting retries, mark the webhook as disabled and notify the consumer. Maintain a delivery log so consumers can query the status of each delivery attempt. Implement a manual replay endpoint so consumers can re-request missed events.

For security: sign each webhook payload with an HMAC using the consumer's secret. Include the signature in a header (X-Webhook-Signature). The consumer verifies the signature to confirm the payload is authentic and has not been tampered with. Use HTTPS for all webhook deliveries.

For idempotency: include a unique event ID in each payload. Consumers should deduplicate based on this ID. Retries will send the same event ID, so consumers know it is a duplicate.

Discuss ordering: webhooks may arrive out of order due to retries and parallel delivery. Include a timestamp and a sequence number. Consumers that need ordering must buffer and reorder. For most use cases, design for out-of-order delivery.

For scalability: partition the delivery queue by consumer to distribute load. Monitor delivery success rates per consumer. Implement circuit breakers for consumers that are consistently failing to avoid wasting resources on dead endpoints.

Follow-up questions:

  • How do you handle a consumer that is temporarily down for hours?
  • How do you prevent webhook abuse (a consumer registering webhooks that point to a victim's server)?
  • How would you support webhook payload filtering (consumers only want certain fields)?

8. How do you design an API for file uploads?

What the interviewer is really asking: Can you handle the practical challenges of large file transfers including reliability, progress tracking, and resource management?

Answer framework:

For small files (under 5MB): direct upload via multipart/form-data POST is sufficient. The API server receives the file, validates it (size, type, content scanning), stores it, and returns the file metadata.

For large files: use presigned URLs. The client requests an upload URL from the API (POST /uploads with metadata). The server generates a presigned URL pointing directly to object storage (S3, GCS) and returns it. The client uploads directly to storage, bypassing the API server. This saves bandwidth, reduces API server load, and supports larger files (5GB+).

For very large files: use chunked/resumable uploads. The client initiates an upload session (POST /uploads), receives a session ID, then uploads chunks (PUT /uploads/{session}/chunks/{index}). Each chunk is 5-10MB. If the upload fails, the client can resume from the last successful chunk rather than starting over. Google Drive and Dropbox use this approach.

For the API design: POST /uploads (initiate, returns upload session with ID), PUT /uploads/{id}/chunks/{index} (upload a chunk), POST /uploads/{id}/complete (signal all chunks uploaded, triggers assembly), GET /uploads/{id} (check upload status). Include Content-Range headers for chunk tracking.

For validation: check file type using content sniffing (not just file extension). Scan for malware asynchronously. Enforce size limits per tier. Generate thumbnails and previews asynchronously.

For storage: use object storage (S3, GCS) for file bytes. Store metadata (filename, size, content type, owner, upload date, storage path) in a database. Generate CDN URLs for downloads. Use caching for frequently accessed files.

Discuss concurrency: use optimistic locking or versioning to handle concurrent uploads replacing the same file. For collaborative scenarios, the last write wins or merge strategies apply.

Follow-up questions:

  • How do you handle a client that starts a resumable upload but never completes it?
  • How would you implement virus scanning without blocking the upload response?
  • How do you generate previews for different file types at scale?

9. How do you design an API for search functionality?

What the interviewer is really asking: Can you design a search API that is both powerful for advanced users and simple for basic use cases?

Answer framework:

Start with the simple case: a query parameter on a list endpoint (GET /products?q=wireless+headphones). This handles keyword search and is intuitive for consumers.

For structured search, add filter parameters: GET /products?q=headphones&category=electronics&price_min=50&price_max=200&sort=rating&order=desc. Use consistent parameter naming across all searchable endpoints.

For complex search, consider a POST endpoint with a JSON body: POST /products/search with a structured query object supporting nested boolean logic (AND, OR, NOT), range queries, full-text search with relevance scoring, and faceted filtering. This is the approach Elasticsearch exposes and is common for e-commerce and content platforms.

For pagination of search results, always use cursor-based pagination. Search results can change between pages (new items matching the query, relevance scores changing), so offset-based pagination leads to missing or duplicate results. Return a search_after token that clients include in the next request.

For response design: include total count (or an estimate for large result sets), the items, facets (counts per category, price range distribution), and suggested queries (did-you-mean, related searches). Return only the fields needed for list display; let clients expand for full details.

Discuss performance: search queries are computationally expensive. Use caching aggressively for common queries. Implement query complexity limits (maximum number of clauses, maximum result window). Use rate limiting specific to search endpoints (lower limits than CRUD endpoints).

For typeahead/autocomplete: a separate endpoint (GET /products/suggest?q=head) optimized for speed (under 50ms response). Use a prefix index, return suggestions as the user types, limit to 5-10 suggestions. Debounce on the client side (200-300ms).

Discuss search relevance: the API should support boosting certain fields (title matches weighted higher than description), filters vs queries (filters are binary and cached, queries score by relevance), and highlighting (return which parts of the text matched the query).

Follow-up questions:

  • How do you handle search in multiple languages?
  • How would you implement saved searches and search alerts?
  • How do you A/B test search relevance changes?

10. Design an idempotent API for financial operations

What the interviewer is really asking: Can you prevent the critical bugs that arise from network retries in systems where duplicate operations cause real harm?

Answer framework:

Idempotency means that making the same request multiple times produces the same result as making it once. This is crucial for financial APIs where a duplicate charge or transfer has real consequences.

The standard approach: require clients to provide an Idempotency-Key header (a UUID) with every mutating request. The server stores the key along with the response. On subsequent requests with the same key, return the stored response without re-executing the operation.

Implementation details: store idempotency records in a database with the key, request hash (to detect different requests with the same key), response status, response body, and creation timestamp. Use a unique constraint on the key to prevent race conditions. TTL of 24-48 hours for stored records (after which the key can be reused).

For the critical race condition: two requests with the same idempotency key arrive simultaneously. Without protection, both might execute. Solution: use a database lock or INSERT ... ON CONFLICT to ensure only one request proceeds. The second request waits (short poll) or receives a 409 Conflict.

Distinguish between request-level and operation-level idempotency. Request-level: the same HTTP request produces the same response (handled by the idempotency key mechanism). Operation-level: the business operation itself is idempotent (e.g., "set balance to $100" is idempotent, but "add $100 to balance" is not). Design operations to be naturally idempotent when possible.

For distributed systems: if the API server crashes after executing the operation but before storing the idempotency record, the client will retry and the operation will execute twice. Solution: store the idempotency record in the same database transaction as the operation. If either fails, both fail, and a retry is safe.

Discuss Stripe's implementation as the gold standard: idempotency keys persist for 24 hours, the first request's response is cached and replayed, different request bodies with the same key return an error, and in-flight requests with the same key return 409.

Follow-up questions:

  • How do you handle idempotency for long-running operations?
  • What happens when the stored response is an error; should retries re-execute?
  • How do you clean up idempotency records without losing safety?

11. How do you design APIs for real-time data?

What the interviewer is really asking: Do you understand the trade-offs between polling, long polling, SSE, and WebSockets, and when to use each?

Answer framework:

Four approaches for real-time data, each with different trade-offs.

Polling: the client repeatedly calls a GET endpoint at a fixed interval. Simple to implement, uses standard HTTP, works through all proxies and firewalls. But wasteful when data changes infrequently (many empty responses) and introduces latency equal to the polling interval. Use for: dashboards that update every 30-60 seconds, background sync.

Long polling: the client sends a GET request, and the server holds the connection open until new data is available or a timeout occurs. More efficient than polling (fewer empty responses) but still creates a new connection for each update cycle. Use for: notification systems, chat (before WebSocket adoption).

Server-Sent Events (SSE): a persistent HTTP connection where the server pushes events. Uses standard HTTP (works through proxies), automatic reconnection built into the browser API, supports event types and IDs. But unidirectional (server to client only) and limited to text data. Use for: live feeds, stock tickers, build/deployment status.

WebSockets: a persistent bidirectional connection. Full duplex communication, binary and text support, lowest latency. But requires special server infrastructure, does not work through some proxies, connection management is complex. Use for: chat, collaborative editing, gaming, real-time trading.

For the API design with SSE: GET /events?stream=true returns a text/event-stream response. Each event has an ID, event type, and data field. Clients include Last-Event-ID on reconnection to resume from where they left off. Server maintains a buffer of recent events for replay.

For WebSocket API design: define a message protocol (JSON with type, action, and payload fields). Implement heartbeat/ping-pong for connection health. Handle reconnection with a message sequence number so the client can request missed messages.

Discuss infrastructure: load balancing persistent connections requires session affinity or a connection-aware load balancer. WebSocket connections are long-lived, so connection count becomes the bottleneck (not requests per second). A single server can handle 50K-100K concurrent WebSocket connections.

Follow-up questions:

  • How do you handle authentication for WebSocket connections?
  • How would you scale SSE to millions of concurrent connections?
  • When would you choose SSE over WebSockets?

12. How do you design a GraphQL API and when is it appropriate?

What the interviewer is really asking: Do you understand GraphQL's strengths and weaknesses compared to REST, and can you design a schema that avoids common pitfalls?

Answer framework:

GraphQL shines when clients need flexible data fetching. Mobile clients may need a subset of fields (save bandwidth), web dashboards may need data from multiple resources in one request, and different clients may need different shapes of the same data.

Schema design principles: define types that mirror your domain model, use interfaces for shared fields across types, use unions for polymorphic types. The schema is the contract, so design it carefully.

For query complexity control (critical for production): implement query depth limiting (prevent deeply nested queries that could cause N+1 database queries), query cost analysis (assign a cost to each field, reject queries exceeding a threshold), and pagination requirements (never allow unbounded lists, require first/last arguments on connections).

For N+1 query prevention: use DataLoader (batching pattern). When resolving a list of users and their posts, DataLoader batches all post lookups into a single query instead of one query per user. This is essential for performance.

Compare with REST for the interview: GraphQL advantages include no over-fetching or under-fetching, strongly typed schema, self-documenting (introspection), and a single endpoint. REST advantages include caching (HTTP caching works naturally with URLs, GraphQL needs custom caching), simpler error handling (HTTP status codes vs GraphQL error arrays), better tooling for monitoring and rate limiting, and easier to learn.

When to use GraphQL: multiple client types with different data needs, rapid frontend iteration, complex data relationships. When to use REST: simple CRUD APIs, public APIs (broader adoption), microservices internal communication (gRPC is even better here).

Discuss compare technologies for the specific trade-offs between GraphQL and REST in different scenarios.

Follow-up questions:

  • How do you handle authentication and authorization in GraphQL?
  • How do you version a GraphQL schema?
  • How do you implement real-time subscriptions in GraphQL?

13. Design an API gateway architecture

What the interviewer is really asking: Do you understand the cross-cutting concerns of API management and how to centralize them without creating a bottleneck?

Answer framework:

An API gateway sits between clients and backend services, handling cross-cutting concerns: authentication, rate limiting, request routing, protocol translation, response aggregation, and observability.

Core responsibilities: request authentication (validate tokens, API keys), rate limiting (enforce per-user and per-endpoint limits), request routing (map external URLs to internal services), protocol translation (REST to gRPC, HTTP/1.1 to HTTP/2), response transformation (remove internal fields, rename fields for API version compatibility), and observability (logging, metrics, distributed tracing).

Architectural patterns: single gateway (one gateway for all clients, simple but can become a bottleneck), BFF (Backend for Frontend, separate gateways for web, mobile, and third-party, each optimized for its client type), and federated gateway (the gateway delegates to domain-specific sub-gateways, reducing centralized complexity).

For high availability: the gateway is on the critical path for every request. Deploy multiple instances behind a load balancer. Use health checks and automatic failover. Keep the gateway stateless (rate limit counters in external Redis, no session state).

For performance: the gateway adds latency to every request (typically 1-5ms). Minimize processing in the gateway. Cache authentication results. Use connection pooling to backend services. Implement circuit breakers for backend services so the gateway can return errors quickly when a service is down.

Discuss the anti-pattern: putting business logic in the gateway. The gateway should only handle cross-cutting concerns. Business logic in the gateway couples all services to the gateway team and makes it a development bottleneck.

For microservices: service mesh (Istio, Linkerd) handles service-to-service communication concerns (mTLS, retries, circuit breakers), while the API gateway handles external-facing concerns. They are complementary, not alternatives.

Follow-up questions:

  • How do you prevent the API gateway from becoming a single point of failure?
  • How would you implement request aggregation (combining multiple backend calls into one response)?
  • How do you handle API gateway deployments without downtime?

14. How do you design APIs for multi-tenancy?

What the interviewer is really asking: Can you design APIs that serve multiple customers with proper isolation while sharing infrastructure?

Answer framework:

Tenant identification: how does the API know which tenant the request belongs to? Options include subdomain (tenant1.api.example.com), URL path (/tenants/tenant1/resources), header (X-Tenant-ID), and JWT claim (tenant_id in the token). Subdomain is the most common for SaaS products because it provides natural isolation and simplifies routing.

Data isolation: per-tenant databases (strongest isolation, highest cost), per-tenant schema within a shared database (good isolation, moderate cost), or shared tables with a tenant_id column (lowest cost, weakest isolation). The choice depends on compliance requirements, data volume per tenant, and the isolation promises you make to customers. See our multi-tenancy design patterns for detailed trade-offs.

API design for multi-tenancy: tenant-scoped resources (/api/v1/users returns only the current tenant's users), cross-tenant queries are forbidden by default (enforce at the data layer, not just the API layer), and tenant-specific configuration (rate limits, feature flags, custom fields).

For rate limiting: separate rate limit pools per tenant. A noisy neighbor should not affect other tenants. Implement per-tenant quotas that align with their pricing tier.

For scalability: use tenant-aware load balancing to route each tenant to specific backend instances. This enables per-tenant scaling and simplifies debugging. Large tenants can get dedicated instances.

Discuss security: tenant isolation must be enforced at every layer (API, service, database). A bug that leaks data between tenants is a critical security incident. Use automated tests that verify tenant isolation. Implement row-level security at the database level as a defense in depth measure.

Follow-up questions:

  • How do you handle a tenant migration from shared to dedicated infrastructure?
  • How do you implement tenant-specific customizations without forking the codebase?
  • How do you prevent one tenant from consuming all shared resources?

15. Design a comprehensive API observability strategy

What the interviewer is really asking: Can you build visibility into how your API is used, where it fails, and how it performs?

Answer framework:

Three pillars of API observability: metrics, logging, and tracing.

Metrics: track request rate (QPS per endpoint and status code), latency (p50, p95, p99 per endpoint), error rate (4xx and 5xx per endpoint), and business metrics (API key usage, endpoint popularity, payload sizes). Use a time-series database for storage. Create dashboards showing golden signals (latency, traffic, errors, saturation). Set up alerts on anomalies.

Logging: log every request with method, path, status code, latency, request ID, user ID, and client version. Do not log sensitive data (tokens, passwords, PII). Use structured logging (JSON) for machine parsing. Include a correlation ID that flows through all services for a single request.

Distributed tracing: for APIs that call multiple backend services, trace the full request path. Each service adds a span to the trace. Use OpenTelemetry for instrumentation, and Jaeger or Zipkin for visualization. This reveals which service is causing latency or errors.

API-specific observability: track deprecation usage (how many requests use deprecated endpoints or fields), version distribution (percentage of traffic per API version), and client SDK versions (detect outdated clients). Use these to make informed decisions about sunset timelines.

For fault tolerance: the observability system itself must be reliable. Use sampling for high-volume tracing (sample 1 percent of requests, 100 percent of errors). Buffer metrics and logs locally before shipping. If the observability backend is down, the API must continue serving requests.

Discuss SLOs and SLIs: define Service Level Objectives for your API (99.9 percent availability, p99 latency under 200ms). Measure Service Level Indicators continuously. Error budgets determine how aggressively you can deploy changes.

Follow-up questions:

  • How do you handle observability for async API operations (webhooks, background jobs)?
  • How do you detect and alert on API contract violations?
  • How do you track API usage for billing purposes?

Common Mistakes in API Design Interviews

  1. Inconsistent naming and conventions. Using camelCase in one endpoint and snake_case in another. Mixing plural and singular nouns. Pick a convention and enforce it everywhere.

  2. Ignoring backward compatibility. Treating breaking changes as minor updates. Not having a versioning strategy. Underestimating the cost of migrating API consumers.

  3. Poor error responses. Returning generic error messages, exposing internal implementation details, or using incorrect HTTP status codes.

  4. Not thinking about pagination from the start. Returning unbounded lists is a ticking time bomb. Always paginate, always limit.

  5. Over-designing for flexibility. A generic query language for every endpoint when simple filters would suffice. Match the API complexity to actual use cases.

  6. Ignoring security. Not discussing authentication, authorization, input validation, or rate limiting until prompted.

How to Prepare for API Design Interviews

Study the APIs of companies known for great API design: Stripe (excellent documentation, idempotency), Twilio (consistent resource model), GitHub (good versioning), and Slack (events API). Understand why their design choices work.

Practice designing APIs on paper: pick a product (e-commerce, social media, calendar) and design the full API including resource models, endpoints, error handling, and authentication. Time yourself to 30 minutes.

Read the REST API design guide, study GraphQL patterns, and understand gRPC for service-to-service communication. Explore our learning paths for structured preparation.

For candidates targeting staff roles, the senior to staff engineer transition guide covers the elevated expectations for API design thinking.

Related Resources

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.