System Design: Travel Recommendation Engine

Requirements

Functional Requirements:

Recommend destinations based on user travel history, searches, and saved places
Surface accommodation and activity recommendations for a selected destination
Collaborative filtering: "travelers like you also visited..."
Trending destinations feed based on aggregate booking velocity
User can provide explicit feedback (interested/not interested) to refine recommendations
Recommendations update within 24 hours of significant new user activity

Non-Functional Requirements:

Recommendations API responds in under 200ms at the 95th percentile
Handle 100 million active users; each generating ~50 interaction events/month
Model retraining pipeline completes within 4 hours on daily data
Cold start: new users receive content-based recommendations within 3 interactions
Recommendation diversity: ensure results aren't excessively repetitive

Scale Estimation

100 million users × 50 interactions/month = 5 billion interaction events/month = 1,929 events/second. Recommendation API calls: assuming 20 recommendation page views/user/month = 2 billion API calls/month = 771 API calls/second average, 5,000/second peak. Model data size: user-item interaction matrix for 100M users × 10M destinations/properties is sparse (most users interact with <100 items); stored as sparse matrix in 50 GB. Precomputed recommendation vectors: 100 million users × 128 dimensions × 4 bytes = 51 GB, storable in memory across a Redis cluster.

High-Level Architecture

The recommendation engine operates on two timescales: offline batch (model training and candidate generation, runs nightly) and online serving (fetching and ranking pre-computed candidates, <200ms). A hybrid approach combines collaborative filtering ("users like you") with content-based filtering ("destinations matching your stated preferences") and contextual re-ranking (current season, trending, flight price signals).

The Offline Pipeline runs on Spark on EMR/Databricks. It reads the interaction log from S3 (clicks, saves, bookings, searches, explicit feedback), trains an Alternating Least Squares (ALS) collaborative filtering model producing 128-dimensional embeddings for users and items, and computes approximate nearest neighbors (ANN) using Faiss to generate top-500 candidate items per user. Candidates and user embeddings are written to Redis and a feature store (Feast or Tecton).

The Online Serving layer retrieves pre-computed candidates from Redis for a given user, applies real-time context (current search location, season, upcoming holidays, trending destinations from a separate pipeline), and runs a ranking model (a lightweight gradient boosted tree on ~30 features) to produce the final ordered recommendation list.

Core Components

Collaborative Filtering Pipeline

ALS training on 5 billion interactions with 100M users and 10M items runs on a 50-node Spark cluster in ~2 hours. The implicit feedback variant (treating search/click as weak positive signal, no interaction as weak negative) produces user and item embedding matrices. Daily incremental updates using matrix factorization on only new interactions (incremental ALS) run in ~30 minutes. Embeddings are pushed to Faiss (Facebook AI Similarity Search) for ANN lookup: given a user vector, find 500 most similar item vectors in sub-millisecond using HNSW indexing.

Content-Based Filtering

For new users with insufficient interaction history (<10 interactions), content-based recommendations are the primary signal. User profiles are enriched with stated preferences (beach vs. city, adventure vs. relaxation, budget level) collected during onboarding and inferred from browsing behavior. Destination content vectors are pre-computed from textual descriptions, categories, and structured attributes using a fine-tuned BERT model. Cosine similarity between user preference vector and destination content vectors produces content-based candidates.

Real-Time Re-Ranking Service

The Re-Ranking Service receives 500 pre-computed candidates per user from Redis, fetches ~30 real-time features per candidate (current average price, weather next 7 days, trending score, days until cheapest flight based on fare API), and scores each with a gradient boosted ranking model (XGBoost, 100 trees). Features are served from a low-latency feature store (Redis for real-time features, pre-materialized features in DynamoDB). The ranking model inference time is <10ms for 500 candidates; total API latency budget is 200ms.

Database Design

User embeddings (128 floats) in Redis hash (user:{user_id} → embedding_bytes) — 51 GB total, distributed across a 4-node Redis cluster. Pre-computed top-500 candidates per user in Redis list (user_recs:{user_id} → [item_id, score, ...]) refreshed nightly. Interaction log raw events in S3 Parquet (partitioned by date). Processed interaction matrix in Delta Lake on S3 for incremental Spark jobs. Item content features in DynamoDB (item:{item_id} → feature map) for fast online serving. Recommendation feedback (thumbs up/down) in PostgreSQL for training signal; fed back into the daily pipeline.

API Design

GET /v1/recommendations/destinations?user_id={}&context={} — Returns personalized ranked list of destination recommendations with explanation strings ("Popular with your travel style")
GET /v1/recommendations/properties?user_id={}&destination_id={} — Returns accommodation recommendations for a specific destination based on user preferences
POST /v1/recommendations/feedback — User submits explicit feedback (INTERESTED, NOT_INTERESTED, ALREADY_VISITED) for a recommendation item; stored and fed into next training cycle
GET /v1/recommendations/trending — Returns trending destinations by booking velocity in the last 7 days; computed from a separate real-time aggregation pipeline

Scaling & Bottlenecks

The ALS training job is the pipeline bottleneck: 5 billion interactions × 128 dimensions requires careful Spark partitioning to avoid data skew. Popular destinations (Paris, New York) have millions of interactions; less popular ones have tens. Frequency capping (sample at most 1,000 interactions per user per item) prevents popular items from dominating the embedding space. The Faiss ANN index (HNSW graph over 10M item vectors) fits in ~5 GB RAM and serves ANN queries in ~1ms — one per user recommendation API call, manageable with a pool of 20 Faiss servers.

The online re-ranking service must fetch features for 500 candidates × 30 features in <150ms (leaving 50ms for model inference and overhead). Batch feature fetching (single DynamoDB BatchGetItem for 100 items) reduces round trips to 5 calls. Redis feature reads are pipelined. The ranking model is pre-loaded in memory on each serving pod; no model loading latency per request.

Key Trade-offs

ALS vs. deep learning for embeddings — deep learning (two-tower neural networks) produces better embeddings but requires GPU training infrastructure; ALS is CPU-trainable and produces competitive quality for travel use cases with simpler operationalization
Pre-computation vs. real-time generation — pre-computing top-500 candidates nightly ensures <200ms serving latency but means recommendations are up to 24 hours stale; a hybrid (stale candidates + real-time re-ranking) balances freshness and cost
Diversity vs. pure relevance — a purely relevance-optimized ranker shows the same Paris-London-Barcelona loop to European travelers repeatedly; a diversity re-ranking step (Maximal Marginal Relevance) injects variety at the cost of 3–5% CTR reduction
Cold start strategy — new users need recommendations before the model has any signal; cascading fallbacks (stated preferences → demographic collaborative filtering → global trending) provide acceptable quality until behavioral data accumulates