System Design: Menu Management System

Requirements

Functional Requirements:

Restaurants create and manage hierarchical menus (categories, items, modifier groups, modifiers)
Support item-level availability toggling (86ing) that propagates to all ordering channels instantly
Multi-location menu management with inheritance and per-location overrides
Scheduled menu switching (breakfast menu 6-11 AM, lunch 11 AM-4 PM, dinner 4-10 PM)
Menu versioning with rollback capability
Nutrition information, allergen tags, and dietary labels per item

Non-Functional Requirements:

Support 1M restaurants with average 80 menu items each = 80M total items
Menu update propagation to all channels within 30 seconds
Menu read latency under 50ms for consumer-facing queries
99.99% availability for menu reads; 99.9% for menu writes
Support 100K concurrent menu edits during peak hours

Scale Estimation

1M restaurants × 80 items × 5 modifiers per item = 400M modifier configurations. Menu reads: every restaurant page view requires a full menu fetch — at 200M restaurant views/day = 2,300 menu reads/sec. Menu writes: 50K menu updates/day (item additions, price changes, availability toggles) = 0.6 writes/sec for structural changes, but availability toggles (86ing) spike to 500/sec during peak hours as restaurants run out of ingredients. Each menu document (full restaurant menu with all items, modifiers, and metadata) averages 150KB serialized as JSON. Total menu catalog size: 1M × 150KB = 150GB.

High-Level Architecture

The menu management system separates the write path (menu editing) from the read path (menu serving) using a CQRS (Command Query Responsibility Segregation) pattern. The write path handles menu creation, editing, and versioning through a Menu Admin Service backed by PostgreSQL as the source of truth. Every menu change creates an immutable event in an event store, enabling full version history and rollback. The read path serves menu data from a denormalized read-optimized store (Redis for hot menus + CDN for static menu assets).

The publishing pipeline bridges write and read paths. When a menu change is committed, a Kafka event triggers the Menu Publisher Service, which: (1) validates the menu structure (no orphaned modifiers, valid price ranges, required fields present), (2) generates a denormalized menu document (flattening the hierarchical structure into a single JSON document optimized for client rendering), (3) writes the document to Redis with key menu:{restaurant_id}, (4) invalidates the CDN cache for the restaurant's menu URL, and (5) pushes availability changes to connected POS systems and third-party delivery platform integrations via webhooks.

For multi-location chains, a Template Service manages menu inheritance. A chain defines a master template menu at the brand level. Each location inherits the template and can apply overrides (different prices, disabled items, location-specific specials). When the template is updated, the change cascades to all locations that haven't overridden the affected field — this cascade is processed asynchronously via a Kafka consumer that iterates over all locations and regenerates their menu documents.

Core Components

Menu Data Model & Versioning

The menu hierarchy is modeled as: Restaurant → Menu (time-based: breakfast, lunch, dinner) → Category (appetizers, entrees, drinks) → Item → Modifier Group (size, toppings, cooking preference) → Modifier (small/medium/large, each topping). Each entity has a version field incremented on every change. The event store records every mutation as an event: MenuItemCreated, MenuItemPriceChanged, ModifierGroupAdded, ItemAvailabilityToggled, etc. The current state is materialized by replaying events from the last snapshot (snapshots taken every 100 events per restaurant). This event sourcing approach provides a complete audit trail and enables rollback to any point in time by replaying events up to the desired timestamp.

Real-Time Availability (86 System)

The 86 system handles the critical flow of marking items as unavailable when a restaurant runs out of an ingredient. When a restaurant toggles an item or modifier as unavailable, the update must propagate within 5 seconds to prevent customers from ordering items that cannot be fulfilled. The fast path bypasses the normal publishing pipeline: the availability toggle writes directly to Redis (updating a field in the menu document) and emits a WebSocket event to all connected clients currently viewing that restaurant's menu. The Kafka event is still emitted for async propagation to third-party platforms, POS systems, and the search index (which needs to filter out unavailable items). A separate availability status is maintained per location in Redis: avail:{restaurant_id}:{location_id} as a bitmap where each bit position corresponds to an item index.

Multi-Channel Publishing

Menus are consumed by multiple channels: the native mobile app, the web app, third-party delivery platforms (DoorDash, Uber Eats, Grubhub via their respective APIs), Google Business listings, and in-store kiosks. Each channel has different format requirements and update mechanisms. The Menu Publisher Service generates channel-specific representations: a JSON document for the native app (rich, includes images and descriptions), a simplified XML feed for Google, and platform-specific payloads for each delivery partner's menu ingestion API. A Channel Adapter pattern encapsulates the transformation logic for each channel. Webhook delivery uses exponential backoff with dead-letter queuing for failed deliveries.

Database Design

The canonical menu data lives in PostgreSQL with a normalized schema. Core tables: menus (menu_id, restaurant_id, name, active_hours_json, version), categories (category_id, menu_id, name, sort_order, image_url), items (item_id, category_id, name, description, base_price, prep_time, calories, allergens_json, dietary_tags, image_url, is_available, version), modifier_groups (group_id, item_id, name, min_select, max_select, is_required), modifiers (modifier_id, group_id, name, price_delta, is_default, is_available). The event store is an append-only table: event_id, restaurant_id, event_type, payload (JSONB), version, created_at, created_by.

The read-optimized menu document in Redis is a denormalized JSON blob averaging 150KB per restaurant. For the 50K most popular restaurants (covering 80% of traffic), these documents are cached in Redis with no TTL (invalidated on write). The remaining 950K restaurants are loaded on-demand from PostgreSQL with a 10-minute Redis TTL. A CDN (Cloudflare) caches menu documents at the edge with a 60-second TTL for the static portions (items, descriptions) and bypass-cache headers for the availability bitmap.

API Design

GET /api/v1/restaurants/{restaurant_id}/menu?time={iso_timestamp} — Fetch the active menu for a restaurant at a specific time; returns the denormalized menu document with current availability
PUT /api/v1/restaurants/{restaurant_id}/menus/{menu_id}/items/{item_id} — Update a menu item; body contains changed fields only (partial update); returns new version number
POST /api/v1/restaurants/{restaurant_id}/items/{item_id}/availability — Toggle item availability (86/un-86); body contains is_available boolean; returns confirmation with propagation_status
POST /api/v1/restaurants/{restaurant_id}/menu/rollback — Rollback menu to a specific version; body contains target_version; returns diff of changes that will be reverted

Scaling & Bottlenecks

The menu publishing pipeline can bottleneck during mass updates — when a chain with 1,000 locations updates their master template, it triggers 1,000 menu document regenerations. Each regeneration involves reading the template, applying location overrides, generating the denormalized document, and writing to Redis. This cascade is handled by a priority queue: individual restaurant updates (user-initiated) get high priority, while cascade updates (template propagation) are processed in the background with a target completion time of 5 minutes for a full chain rollout.

Redis memory is a concern with 150KB per menu × 50K hot restaurants = 7.5GB just for menu documents, plus the availability bitmaps. A Redis Cluster with 3 shards (each 16GB) handles this comfortably. The PostgreSQL menu tables see write contention on popular restaurants during peak editing hours (menu prep before dinner service). Row-level locking with optimistic concurrency control (version field check on update) prevents lost writes while allowing concurrent edits to different items in the same menu.

Key Trade-offs

Event sourcing over simple CRUD for menu changes: Event sourcing provides complete audit trail and rollback capability (critical for menu pricing disputes), but increases storage and adds complexity to the read path — materialized views with periodic snapshots mitigate the replay cost
Denormalized JSON documents over normalized API responses: Pre-computing the full menu document eliminates N+1 query problems and enables CDN caching, but any menu change requires regenerating the entire document — incremental updates to the JSON blob in Redis partially address this for availability toggles
Direct Redis write for 86 fast path over event-driven only: Writing availability changes directly to Redis ensures sub-5-second propagation to consumers, but creates a dual-write situation where Redis and PostgreSQL can diverge — a reconciliation job runs every minute to detect and fix inconsistencies
Template inheritance over full menu cloning for chains: Inheritance reduces data duplication and ensures brand consistency across locations, but adds complexity when overrides conflict with template changes — a merge strategy (location override always wins) provides predictable behavior