INTERVIEW_QUESTIONS

Elasticsearch Interview Questions for Senior Engineers (2026)

Master advanced Elasticsearch interview questions covering distributed search architecture, indexing strategies, relevance tuning, cluster management, and production scaling for senior engineering roles.

20 min readUpdated Apr 19, 2026
interview-questionselasticsearchsenior-engineer

Why Elasticsearch Matters in Senior Engineering Interviews

Elasticsearch is the backbone of search and observability infrastructure at companies like Netflix, Uber, and GitHub. Whether powering product search for e-commerce platforms, log aggregation for DevOps teams, or real-time analytics dashboards, Elasticsearch appears in nearly every modern tech stack. Senior engineers are expected to understand not just the query DSL, but the distributed systems fundamentals that determine whether a cluster handles 10 requests per second or 100,000.

At the senior level, interviewers test your ability to design mappings that balance search relevance with storage efficiency, configure sharding strategies that scale horizontally, tune relevance scoring for real-world search quality, and diagnose cluster health issues under production pressure. This guide covers the most challenging Elasticsearch interview questions with structured frameworks for comprehensive answers.

For broader search system architecture, see our search engine system design and how Elasticsearch works. For comparison context, see our Elasticsearch vs Solr guide.


1. Explain Elasticsearch's distributed architecture and how data is stored across nodes.

What the interviewer is really asking: Do you understand the shard-based distribution model, node roles, and how Elasticsearch achieves horizontal scalability and fault tolerance?

Answer framework:

Elasticsearch distributes data using a shard-based architecture. An index is divided into primary shards, each of which can have replica shards. Shards are the unit of distribution — each shard is a self-contained Lucene index that can be placed on any node in the cluster.

When you create an index with 5 primary shards and 1 replica, Elasticsearch creates 10 total shards (5 primary + 5 replica). The cluster automatically distributes these across available nodes, ensuring replicas are never on the same node as their primary (for fault tolerance).

Node roles in a production cluster:

  • Master-eligible nodes: Participate in cluster state management (index creation, shard allocation, node tracking). Run 3 dedicated master nodes for cluster stability.
  • Data nodes: Store data and execute search/aggregation queries. These are your workhorses.
  • Coordinating nodes: Route requests, merge results from data nodes, and reduce. Useful for heavy aggregation workloads.
  • Ingest nodes: Run ingest pipelines to transform documents before indexing.

Document routing: when you index a document, Elasticsearch determines its shard using the formula shard = hash(routing_value) % number_of_primary_shards. The default routing value is the document's _id. This is why the number of primary shards cannot be changed after index creation (it would change the routing formula)._

Search execution: a coordinating node receives the search request, forwards it to one copy of each shard (primary or replica), each shard executes the query locally and returns its top results, and the coordinating node merges and re-sorts the results. This scatter-gather pattern means search performance scales with the number of shards but adds coordination overhead.

For deeper distributed systems context, see our distributed systems guide and consistent hashing concepts.

Follow-up questions:

  • How does Elasticsearch handle a node failure? What happens to the shards on that node?
  • What is the split-brain problem, and how does Elasticsearch prevent it?
  • How would you decide between more primary shards vs more replicas?

2. How do you design Elasticsearch mappings for optimal search and storage?

What the interviewer is really asking: Can you balance search flexibility with storage efficiency and query performance? Do you understand field types and their implications?

Answer framework:

Mapping design determines how Elasticsearch indexes, stores, and searches your data. Key decisions:

Dynamic vs explicit mappings: Dynamic mapping auto-detects field types from the first document. This is convenient for development but dangerous in production — a string field might be mapped as both text (for full-text search) and keyword (for exact match), doubling storage. Use explicit mappings in production with dynamic: strict to reject unmapped fields.

Text vs keyword: text fields are analyzed (tokenized, lowercased, stemmed) for full-text search. keyword fields are stored as-is for exact match, aggregations, and sorting. A product name needs text for search and keyword (as a multi-field) for sorting and faceting: {"name": {"type": "text", "fields": {"keyword": {"type": "keyword"}}}}.

Analyzers: Choose analyzers based on your search requirements. The standard analyzer works for most English text. Use english analyzer for stemming ("running" matches "run"). Use custom analyzers for edge n-grams (autocomplete), phonetic matching, or domain-specific tokenization.

Nested vs object: Object fields flatten nested JSON, which breaks correlation between fields in arrays. If you have an array of objects and need to query field combinations within the same object, use nested type (each nested object is indexed as a hidden separate document). Nested queries are slower — use them only when needed.

doc_values vs stored fields: doc_values are columnar on-disk structures used for sorting, aggregations, and scripting. They're enabled by default for non-text fields. For large text fields you only need for search (not aggregation), disable doc_values to save disk space.

_source field: Contains the original JSON document. Disabling _source saves storage but prevents updates (reindex), highlighting, and reindex operations. Almost always keep _source enabled; use source filtering at query time instead._

See our Elasticsearch deep dive and search engine system design for broader context.

Follow-up questions:

  • How would you handle mapping conflicts when merging indices?
  • What is the performance impact of nested fields, and when would you use flattened type instead?
  • How do you evolve mappings without reindexing?

3. How do you tune search relevance in Elasticsearch?

What the interviewer is really asking: Do you understand BM25 scoring, and can you go beyond default relevance to build a search experience that actually works for users?

Answer framework:

Elasticsearch uses BM25 (Best Matching 25) as its default similarity algorithm, which considers term frequency (how often the term appears in the document), inverse document frequency (how rare the term is across all documents), and field length normalization (shorter fields get a boost).

Relevance tuning strategies:

Field boosting: Weight fields differently. A match in the title should score higher than a match in the description. Use multi_match with field boosts: {"multi_match": {"query": "kubernetes", "fields": ["title^3", "description^2", "body"]}}.

Function score queries: Combine text relevance with business signals. For example, boost recently updated content, popular products, or items with high ratings. function_score query with field_value_factor, decay functions (freshness decay), and script_score for custom logic.

Synonyms and analysis: Configure synonym filters to match "laptop" when users search "notebook". Use char filters for domain-specific normalization. Test with the _analyze API to verify tokenization._

Phrase matching and slop: match_phrase requires terms in order. Adding slop allows terms to be separated by a number of positions. Use phrase matching to boost exact phrases while still returning partial matches.

Rescore queries: Apply expensive relevance calculations to only the top N results. First pass uses a simple query for recall, second pass uses a complex query (e.g., phrase matching, ML-based scoring) for precision on the top results.

Learning to Rank (LTR): For advanced relevance, use the LTR plugin to train ML models on click-through data. Features include BM25 score, freshness, popularity, and click-through rate. This is how Google, Amazon, and Netflix achieve superior search relevance.

See our search engine design and how recommendation systems work for related patterns.

Follow-up questions:

  • How would you A/B test relevance changes in a production search system?
  • What is the difference between should clauses and boosting for relevance tuning?
  • How would you handle multi-language search relevance?

4. How do you handle index lifecycle management and time-series data in Elasticsearch?

What the interviewer is really asking: Do you understand ILM policies, rollover, and the operational strategies that keep Elasticsearch clusters healthy with growing data?

Answer framework:

Elasticsearch's Index Lifecycle Management (ILM) automates index management through four phases: Hot, Warm, Cold, and Delete.

Hot phase: Actively writing and searching. Use fast SSDs. Configure rollover to create a new index when the current one reaches a size (50GB) or age (1 day) threshold.

Warm phase: No longer writing but still querying. Move to cheaper storage. Reduce replicas. Force merge segments to reduce overhead. Shrink shards.

Cold phase: Rarely queried. Use cheapest storage. Can use searchable snapshots (data stored in object storage like S3, loaded on demand).

Delete phase: Remove indices past retention period.

For time-series data (logs, metrics, events), use data streams (Elasticsearch 7.9+). A data stream is an abstraction over time-based indices with automatic rollover. Write to the data stream; Elasticsearch routes to the current write index. Searches span all backing indices transparently.

Index template configuration: define mappings, settings, and ILM policy in an index template. Data streams automatically apply the template to each new backing index.

Key sizing decisions:

  • Shard size: Target 20-50GB per shard. Smaller shards increase cluster overhead; larger shards slow recovery.
  • Shard count: More shards enable parallel search but increase memory overhead on the master node. A cluster with 100,000+ shards will have stability issues.
  • Refresh interval: Default 1 second makes new documents searchable within 1 second. For high-throughput indexing, increase to 30 seconds or disable during bulk imports.

For observability platform design, see our monitoring system design and distributed systems guide.

Follow-up questions:

  • How would you migrate an existing index-per-day pattern to data streams?
  • What are searchable snapshots and when would you use them?
  • How do you handle retention policies that vary by data type or customer?

5. How do you diagnose and fix performance problems in an Elasticsearch cluster?

What the interviewer is really asking: Do you have real production experience troubleshooting Elasticsearch, or just textbook knowledge?

Answer framework:

Systematic performance diagnosis follows this flow:

Cluster health check: GET _cluster/health — is the cluster green, yellow, or red? Yellow means some replica shards are unassigned (usually a capacity issue). Red means primary shards are unassigned (data loss risk). Check _cluster/allocation/explain for why shards are unassigned.

Hot threads: GET _nodes/hot_threads reveals what the JVM threads are doing. High CPU on search threads indicates query bottlenecks. High CPU on bulk threads indicates indexing pressure. Long GC pauses indicate memory pressure._

Slow query log: Enable slow search and slow index logs to capture queries exceeding a threshold. Analyze slow queries for missing mappings, expensive wildcards, deep aggregations, or large result sets.

Common performance issues and fixes:

  1. Search latency spikes: Often caused by expensive queries (leading wildcards, regex, deeply nested aggregations), insufficient replicas (all queries hitting the same shards), or GC pauses. Fix by optimizing queries, adding replicas, or increasing heap.

  2. Indexing throughput drops: Usually caused by too many segments (segment merging overhead), small bulk requests (increase batch size to 5-15MB), synchronous refresh (increase refresh_interval), or disk I/O saturation.

  3. Disk watermarks: Elasticsearch stops allocating shards when disk usage exceeds thresholds (85% low, 90% high, 95% flood). Monitor proactively. ILM policies and index curation prevent this.

  4. JVM heap pressure: Elasticsearch recommends heap no larger than 50% of RAM (max 31GB for compressed oops). High heap usage causes frequent GC pauses. Common causes: fielddata on text fields (use keyword fields for aggregations), too many shards (each shard consumes heap), and large aggregation cardinalities.

  5. Cluster instability: Master node overload from too many shards, pending tasks backing up, or frequent shard reallocation. Dedicated master nodes with adequate resources solve this.

Tooling: use Kibana Stack Monitoring, _cat/shards, _cat/nodes, _nodes/stats, and _cluster/stats for operational visibility.

Follow-up questions:

  • How would you handle a cluster that's stuck in yellow status for hours?
  • What is the impact of too many shards on cluster performance?
  • How do you plan capacity for an Elasticsearch cluster?

6. Explain the Elasticsearch query execution model and the difference between queries and filters.

What the interviewer is really asking: Do you understand scoring vs filtering and how to structure queries for optimal performance?

Answer framework:

Elasticsearch has two contexts for search clauses: query context (how well does this document match? — produces a relevance score) and filter context (does this document match yes/no? — no scoring, cacheable).

Filters are faster because: they skip scoring computation, they're cached in a bitset for reuse, and they enable short-circuit evaluation. Place exact-match conditions (status, category, date range) in filters and full-text conditions in queries.

Optimal query structure using bool:

json

must contributes to score and filters. filter filters without scoring (cached). should optionally boosts score. must_not excludes without scoring (cached).

Query execution phases:

  1. Query phase: Each shard executes the query locally, producing a priority queue of matching document IDs and scores. The coordinating node merges these queues.
  2. Fetch phase: The coordinating node requests full documents for only the top N results from the relevant shards.

This two-phase approach means only top results are fetched, making deep pagination expensive (from: 10000, size: 10 still requires sorting 10,010 results per shard). For deep pagination, use search_after or point-in-time APIs instead.

See our system design interview guide for how search systems fit into larger architectures.

Follow-up questions:

  • How does the filter cache work, and when are caches invalidated?
  • What is the performance difference between term query in filter context vs query context?
  • How does minimum_should_match affect scoring and performance?

7. How would you implement autocomplete/search-as-you-type in Elasticsearch?

What the interviewer is really asking: Can you design a responsive search experience that handles partial input, typos, and suggestions at scale?

Answer framework:

Four approaches, each with different trade-offs:

Completion suggester: Built specifically for autocomplete. Uses an in-memory FST (Finite State Transducer) data structure for extremely fast prefix lookups. Limitations: prefix-only matching ("kub" matches "kubernetes" but "bern" doesn't), limited filtering, and the FST must fit in memory.

Edge n-gram analyzer: Tokenize text into prefixes at index time. Configure a custom analyzer with edge_ngram token filter (min_gram: 2, max_gram: 15). At search time, use a standard analyzer (not edge_ngram) to match against the pre-built prefixes. This is more flexible than the completion suggester and supports full scoring.

search_as_you_type field type (Elasticsearch 7.2+): A purpose-built field type that automatically creates sub-fields with edge n-grams and shingles. It handles prefix, infix, and phrase matching out of the box. This is the recommended approach for most search-as-you-type implementations.

Fuzzy matching: Use fuzziness: "AUTO" to handle typos. Elasticsearch uses Levenshtein distance to find terms within an edit distance. Combine with the above approaches for typo-tolerant autocomplete.

Performance considerations: autocomplete queries must return in under 100ms for a good UX. Use _source: false and only return the fields needed for display. Limit results to 5-10 suggestions. Use the filter context for category/scope restrictions._

For context-aware suggestions (showing different results based on user history or category), combine the suggestion query with function_score to personalize results.

See our search engine system design for the full architecture.

Follow-up questions:

  • How would you implement "did you mean" suggestions for misspelled queries?
  • How do you handle autocomplete for multi-language content?
  • What is the memory impact of the completion suggester on large datasets?

8. How do you handle Elasticsearch cluster upgrades with zero downtime?

What the interviewer is really asking: Do you have operational experience with production Elasticsearch clusters? Can you plan and execute a safe upgrade?

Answer framework:

Elasticsearch supports rolling upgrades within the same major version (e.g., 8.8 to 8.12) and requires a more careful approach for major version upgrades (7.x to 8.x).

Rolling upgrade process:

  1. Disable shard allocation (transient cluster.routing.allocation.enable: none) to prevent unnecessary shard movement
  2. Perform a synced flush (POST _flush/synced) to speed up shard recovery after restart
  3. Stop the first node, upgrade its Elasticsearch version, restart it
  4. Re-enable shard allocation and wait for the cluster to return to green
  5. Repeat for each node_

Major version upgrades require checking compatibility first: use the Deprecation API to identify breaking changes, ensure all indices are compatible with the target version (you cannot read indices from two major versions back), and test the upgrade in a staging environment with production data.

Reindex-based migration (safest for major changes): create a new cluster running the target version, reindex data from the old cluster using remote reindex, switch traffic to the new cluster once validation passes. This requires more resources but provides clean rollback — simply switch back to the old cluster.

Plugin compatibility: verify all plugins (analysis, security, monitoring) support the target version before upgrading. Plugin incompatibility is a common upgrade blocker.

See our system design interview guide and distributed systems guide for operational best practices.

Follow-up questions:

  • What happens if a node fails to restart after an upgrade?
  • How do you validate that search relevance hasn't changed after an upgrade?
  • How would you handle an upgrade that requires a mapping change?

9. How would you design a multi-tenant search system using Elasticsearch?

What the interviewer is really asking: Can you balance isolation, performance, and operational complexity for a SaaS search platform?

Answer framework:

Three multi-tenancy approaches:

Index per tenant: Each tenant gets their own Elasticsearch index. Maximum isolation, easy to manage per-tenant settings and mappings. Disadvantage: doesn't scale beyond a few hundred tenants because each index has overhead (shards, mappings, cluster state). Suitable for small numbers of large tenants.

Shared index with routing: All tenants share one index (or a set of time-based indices). Use a tenant_id field and custom routing (routing=tenant_123) to ensure all documents for a tenant are on the same shard. This enables shard-level isolation during search and avoids scatter-gather queries. Add a filter for tenant_id to every query. Disadvantage: noisy neighbor risk, mapping must accommodate all tenants.

Index per tenant group (tiered): Group small tenants into shared indices, give large tenants dedicated indices. This balances isolation and resource efficiency. Use index aliases to abstract the mapping — the application always queries the alias, which points to either a shared or dedicated index.

Security: use Elasticsearch's document-level security (DLS) with roles that restrict access by tenant_id. Never rely solely on application-level filtering — a bug could expose data across tenants.

Performance isolation: for shared indices, use shard-level request routing to prevent one tenant's heavy queries from affecting others. Monitor per-tenant query latency and move noisy tenants to dedicated indices proactively.

See our system design interview guide and backend development crash course.

Follow-up questions:

  • How would you handle a tenant that generates 100x the data volume of average tenants?
  • What is the cluster state impact of thousands of indices?
  • How would you implement per-tenant search analytics?

10. Explain cross-cluster search and cross-cluster replication in Elasticsearch.

What the interviewer is really asking: Do you understand multi-cluster architectures and when they're necessary?

Answer framework:

Cross-Cluster Search (CCS) allows a query to span indices across multiple Elasticsearch clusters. The local cluster acts as a coordinating node, forwarding the search request to remote clusters and merging results. Use cases: searching across geographically distributed clusters (each region has its own cluster), searching across organizational boundaries, or reducing blast radius (a cluster failure only affects one region's data).

Configuration: define remote clusters with connection strings. The local cluster maintains persistent connections to remote clusters. CCS handles network failures gracefully — if a remote cluster is unreachable, results from available clusters are still returned (with a flag indicating partial results).

Cross-Cluster Replication (CCR) replicates indices from a leader cluster to follower clusters. Use cases: disaster recovery (follower in another region for failover), geo-proximity (replicate data close to users for lower latency), and centralized reporting (replicate from multiple production clusters to a reporting cluster).

CCR is index-level: you configure which indices to replicate and to which follower clusters. Replication is near real-time. Follower indices are read-only — writes go to the leader.

Architectural decision: CCS vs CCR. Use CCS when you need to search across data that naturally lives in different clusters. Use CCR when you need copies of the same data in multiple locations for availability or latency.

See our distributed systems guide and how CDNs work for related distribution patterns.

Follow-up questions:

  • What is the latency impact of cross-cluster search?
  • How do you handle a scenario where cross-cluster replication lag causes stale search results?
  • When would you use cross-cluster search versus replicating data to a central cluster?

11. How do you implement geospatial search in Elasticsearch?

What the interviewer is really asking: Can you build location-based features that perform well at scale?

Answer framework:

Elasticsearch supports geospatial data types and queries for building location-based search features.

Field types: geo_point stores a latitude/longitude pair. geo_shape stores complex shapes (polygons, lines, circles) for boundary-based queries.

Common geospatial queries:

  • geo_distance: Find all restaurants within 5km of the user. Uses a bounding box optimization internally for efficiency.
  • geo_bounding_box: Find documents within a rectangular area (typical map viewport). Very efficient, good for map-based UIs.
  • geo_shape: Find documents within a polygon (e.g., find all properties within a school district boundary). Uses a spatial index (BKD tree) internally.

Combining geospatial with relevance: use function_score with a decay function to boost nearby results while still considering text relevance. For a restaurant search, combine text matching (cuisine, name) with distance decay so nearby relevant restaurants rank higher than distant exact matches.

Performance at scale: geospatial queries can be expensive. Use geo_bounding_box as a pre-filter (it uses the index efficiently) before applying more expensive geo_distance calculations. For aggregations on large datasets, use geohash_grid or geotile_grid to bucket results into geographic cells.

See our search engine design and the location-based service system design for full architectural context.

Follow-up questions:

  • How would you implement a "show results as I move the map" feature?
  • What is the precision trade-off with geohash aggregations?
  • How do you index and search complex boundaries (city limits, delivery zones)?

12. How do you secure an Elasticsearch cluster in production?

What the interviewer is really asking: Elasticsearch has had a notorious history of exposed clusters. Do you know how to properly secure it?

Answer framework:

Elasticsearch security (included in the default distribution since 8.0) covers multiple layers:

Authentication: Enable the built-in security features. Use the native realm for local users, LDAP/Active Directory for enterprise identity, SAML or OpenID Connect for SSO, and API keys for service-to-service communication. Elasticsearch 8.x enables security by default with auto-generated credentials.

Authorization (RBAC): Define roles with index-level privileges (read, write, manage), document-level security (DLS — filter documents a role can see), and field-level security (FLS — restrict which fields a role can access). Map users to roles via role mappings.

Encryption: TLS for node-to-node communication (transport layer) and client-to-node communication (HTTP layer). Use certificate-based authentication between nodes. Enable encryption at rest using filesystem-level encryption or the Elasticsearch keystore for sensitive settings.

Network security: Never expose Elasticsearch directly to the internet. Place behind a reverse proxy or API gateway. Use network-level restrictions (security groups, firewall rules). Disable unnecessary features (scripting, dynamic index creation for unauthorized users).

Audit logging: Enable audit logging to track authentication attempts, access to sensitive indices, and configuration changes. Forward audit logs to a separate system for tamper resistance.

See our cryptography concepts and backend development crash course.

Follow-up questions:

  • How would you implement API key rotation without downtime?
  • What is document-level security and how does it impact query performance?
  • How do you handle compliance requirements like GDPR right to erasure in Elasticsearch?

13. How would you use Elasticsearch for log aggregation and observability?

What the interviewer is really asking: Do you understand the ELK/Elastic Stack architecture and the unique challenges of log-scale data?

Answer framework:

The Elastic Stack (Elasticsearch, Logstash, Kibana, Beats) is the most widely deployed observability platform. The architecture:

Data collection: Beats (lightweight agents) collect logs (Filebeat), metrics (Metricbeat), and network data (Packetbeat) from servers. Elastic Agent (unified agent) consolidates multiple beats.

Processing: Logstash or Elasticsearch ingest pipelines parse, transform, and enrich log data. Common transformations: grok patterns for parsing unstructured logs, GeoIP enrichment for IP addresses, date parsing, and field extraction.

Storage: Elasticsearch stores log data in time-based indices (data streams). ILM policies manage the lifecycle — hot tier for recent logs (SSDs), warm tier for older logs (HDDs), cold tier or frozen tier (searchable snapshots on object storage).

Visualization: Kibana provides dashboards, discover (ad-hoc log search), and alerting.

Scale considerations for log data:

  • Ingestion throughput: size your cluster for peak log volume. A single data node can typically handle 20-50K events/second depending on document complexity. Use ingest nodes to offload parsing.
  • Retention vs cost: logs can grow to terabytes per day. Use data tiers (hot/warm/cold/frozen) aggressively. Set ILM policies with appropriate retention (7 days hot, 30 days warm, 90 days cold, then delete).
  • Index strategy: use daily or weekly indices based on volume. Set appropriate shard count per index (a 50GB daily index needs 1-2 primary shards, not 5).
  • Mapping optimization: disable _source for log indices if you don't need the original document (saves ~40% storage). Use keyword instead of text for fields you only filter on (no full-text search needed). Disable dynamic mapping to prevent mapping explosions from unstructured logs._

Alternatives: for pure log aggregation, consider whether Elasticsearch is the right choice. Lighter solutions like Loki (log aggregation without full-text indexing) or ClickHouse (columnar analytics) may be more cost-effective at scale.

See our monitoring system design for the complete observability architecture.

Follow-up questions:

  • How would you handle a mapping explosion from unstructured JSON logs?
  • What is the cost comparison between Elasticsearch and Loki for log storage?
  • How would you implement correlation across logs, metrics, and traces?

14. How does Elasticsearch handle text analysis, and how would you build a custom analyzer?

What the interviewer is really asking: Do you understand the analysis pipeline that converts text into searchable tokens? Can you customize it for specific search requirements?

Answer framework:

Text analysis is the process of converting text into tokens (terms) that are stored in the inverted index. The analysis pipeline has three components:

Character filters: Process the raw text before tokenization. Examples: html_strip (removes HTML tags), pattern_replace (regex-based replacement), mapping (character-level substitutions).

Tokenizer: Splits text into tokens. standard tokenizer splits on word boundaries (handles most languages). whitespace splits only on whitespace. pattern splits using a regex. keyword treats the entire input as a single token.

Token filters: Process tokens after tokenization. lowercase (normalize case), stop (remove common words), stemmer (reduce words to root form — "running" becomes "run"), synonym (expand terms — "k8s" becomes "kubernetes"), edge_ngram (create prefix tokens for autocomplete).

Custom analyzer example for e-commerce product search:

json

Debug analysis with the _analyze API: POST _analyze {"analyzer": "product_analyzer", "text": "Running shoes for sale"} shows the output tokens.

Key concept: index-time vs search-time analysis. You can use different analyzers for indexing and searching. For autocomplete, index with edge_ngram (creates "k", "ku", "kub", "kube", "kuber"...) but search with standard analyzer (the user's query "kube" matches the pre-built token).

Follow-up questions:

  • How do synonym filters affect index size and query performance?
  • What is the difference between synonym and synonym_graph token filters?
  • How would you implement language detection and route to the correct analyzer?

15. How do you handle data consistency between Elasticsearch and a primary database?

What the interviewer is really asking: Elasticsearch is usually a secondary store. Do you understand the synchronization challenges and patterns?

Answer framework:

Elasticsearch is rarely the source of truth — it's typically synchronized from a primary database (PostgreSQL, MongoDB, etc.). Keeping them in sync is a critical operational challenge.

Synchronization patterns:

Dual write: Application writes to both the primary database and Elasticsearch. Simple but has a fatal flaw — if one write succeeds and the other fails, data becomes inconsistent. The application must handle partial failures, which is complex and error-prone. Avoid this pattern.

Change Data Capture (CDC): Capture changes from the primary database's transaction log and apply them to Elasticsearch. Tools: Debezium (captures PostgreSQL WAL or MongoDB oplog and publishes to Kafka), Kafka Connect Elasticsearch Sink (consumes from Kafka and writes to Elasticsearch). This is the gold standard — it's asynchronous, reliable, and decouples the systems.

Application-level events: After a successful database write, publish an event to a message queue (Kafka, RabbitMQ). A consumer reads events and updates Elasticsearch. More flexible than CDC but requires application changes and careful handling of event ordering.

Periodic batch sync: Run a scheduled job that queries the primary database for changed records (using an updated_at timestamp) and bulk-indexes them into Elasticsearch. Simple but introduces latency (minutes to hours). Suitable for data that doesn't need real-time search freshness.

Consistency guarantees: all these patterns provide eventual consistency. The application must handle the lag — a user creates a record and immediately searches for it, but it hasn't been indexed yet. Solutions: read-your-writes consistency by querying the primary database for recently created records, or use Elasticsearch's refresh API to force immediate visibility (at a performance cost).

See our how Kafka works, ETL pipeline concepts, and system design interview guide.

Follow-up questions:

  • How would you handle a scenario where Elasticsearch and the primary database are out of sync?
  • What is the latency of each synchronization pattern, and how do you choose?
  • How do you handle schema changes in the primary database that affect the Elasticsearch mapping?

Common Mistakes in Elasticsearch Interviews

  1. Not understanding the difference between queries and filters — Placing exact-match conditions in query context wastes scoring computation and prevents caching. Always use filter context for non-scoring conditions.

  2. Over-sharding — Creating indices with too many primary shards (a common mistake with default settings) wastes cluster resources. Each shard has overhead. Right-size shards to 20-50GB.

  3. Ignoring mapping design — Using dynamic mappings in production leads to mapping explosions, incorrect field types, and wasted storage. Design explicit mappings.

  4. Treating Elasticsearch as a primary database — Elasticsearch is not designed for transactional workloads. It's a search and analytics engine. Always have a source of truth elsewhere.

  5. Not understanding the near real-time nature — Documents aren't searchable until the next refresh (default 1 second). This trips up candidates who expect instant consistency.

  6. Skipping relevance tuning — Default BM25 scoring rarely produces good search results for production use cases. Demonstrating knowledge of function scores, boosting, and synonyms shows real search experience.

How to Prepare

Week 1: Set up a local Elasticsearch cluster. Practice CRUD operations, mapping design, and the query DSL. Understand the difference between query and filter context.

Week 2: Implement a search application with autocomplete, faceted search, and relevance tuning. Use the _explain API to understand scoring._

Week 3: Study Elasticsearch operations — cluster health monitoring, ILM policies, rolling upgrades, and capacity planning.

Week 4: Study distributed search architecture. Understand shard routing, cross-cluster search, and the scatter-gather execution model.

For comprehensive preparation, see our system design interview guide and explore the learning paths. Check out our pricing plans for full access.

Related Resources

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.