System Design: Expense Tracking App

Requirements

Functional Requirements:

Manual expense entry and receipt photo scanning with OCR data extraction
Automatic expense categorization with customizable categories and rules
Budget setting and tracking with alerts when approaching or exceeding limits
Group expense splitting (equal, percentage, exact amounts) with settlement tracking
Bank account and credit card linking for automatic transaction import
Multi-currency expense tracking with automatic FX conversion

Non-Functional Requirements:

Support 20M active users with offline-first mobile experience
Receipt OCR processing within 10 seconds of upload
Budget calculation updates in near-real-time (within 30 seconds of new expense)
99.9% availability for expense entry and viewing
Data sync across devices within 5 seconds when online

Scale Estimation

20M active users logging an average of 3 expenses/day = 60M expense entries/day = 694 writes/sec. Receipt uploads: 30% of expenses include a photo = 18M receipt images/day at average 3MB each = 54TB/day of image storage (before compression). OCR processing: 18M images/day = 208 images/sec requiring GPU inference. Bank sync: 10M linked accounts refreshed daily = 10M API calls to aggregators. Budget calculations: each new expense triggers budget recalculation for the user's active budgets (average 5 budgets per user) = 300M budget checks/day. Group settlements: 5M active groups with average 3 members = 15M settlement balance computations/day.

High-Level Architecture

The expense tracking app is built around an offline-first architecture where the mobile app maintains a complete local database (SQLite on iOS/Android) and syncs with the cloud backend when connectivity is available. This ensures users can log expenses anywhere, including offline environments (traveling, restaurants with poor connectivity). Sync uses a conflict-free replicated data type (CRDT) approach for expense records — last-write-wins for individual fields with tombstone markers for deletions.

The cloud backend follows a microservices architecture: Expense Service (CRUD for expense records), Receipt Service (image upload, OCR processing, data extraction), Budget Service (budget definitions and spending tracking), Group Service (group management, expense splitting, settlement tracking), Sync Service (manages bidirectional data sync between mobile clients and cloud), and Import Service (bank/card transaction import via Plaid).

The receipt processing pipeline is asynchronous: the mobile app uploads the receipt image to S3, sends the S3 key to the Receipt Service, which queues an OCR job on SQS. A fleet of GPU workers processes the queue, extracting merchant name, date, total amount, line items, and currency from the receipt. Extracted data is presented to the user for confirmation/correction, creating a feedback loop that improves the OCR model over time.

Core Components

Receipt OCR & Data Extraction

The Receipt Service processes uploaded receipt images through a multi-stage pipeline: (1) image preprocessing — deskewing, denoising, and contrast enhancement using OpenCV; (2) text detection — a CRAFT text detection model identifies text regions in the image; (3) text recognition — a TrOCR (Transformer-based OCR) model extracts text from detected regions; (4) field extraction — a fine-tuned layout-aware language model (LayoutLM) identifies semantic fields (merchant_name, date, total, tax, tip, line_items) from the recognized text based on spatial relationships. The pipeline achieves 92% accuracy on total amount extraction and 85% on merchant name. Incorrect extractions flagged by users during confirmation are fed back into the training pipeline for monthly model updates. The service runs on GPU instances (NVIDIA T4) processing 3 receipts/sec per GPU, with auto-scaling based on SQS queue depth.

Budget Engine

The Budget Engine manages user-defined spending budgets (e.g., "$500/month on Dining" or "$200/week on Entertainment"). Each budget is defined by: category (or set of categories), amount limit, period (weekly/monthly/yearly), and alert thresholds (warn at 80%, alert at 100%). When a new expense is recorded, the Budget Engine recalculates all affected budgets for that user. Current spending is computed by querying the sum of expenses matching the budget's category filter within the current period. To avoid recalculating from scratch each time, the engine maintains running totals in Redis: budget:{budget_id}:current_spend incremented atomically on each new expense and reset at period boundaries by a scheduled job. Alert evaluation triggers push notifications via FCM/APNS when thresholds are crossed.

Group Expense & Settlement Service

The Group Service implements Splitwise-style group expense splitting. When a group member adds a shared expense ($120 dinner split among 4 people), the service calculates each member's share (configurable: equal split = $30 each, or custom percentages/amounts). Settlement balances are maintained as a directed graph: each edge represents a debt between two members. The Simplify Debts algorithm reduces the number of required payments by computing net balances (Alice owes $30 to Bob and Bob owes $20 to Alice → Alice owes $10 to Bob). The minimum cash flow algorithm (using a greedy approach on net balances) minimizes the total number of transactions needed to settle all debts. Settlements are tracked as they occur (Venmo, cash, bank transfer) and debt edges are reduced accordingly.

Database Design

The cloud database is PostgreSQL. Core tables: expenses (expense_id UUID, user_id, amount NUMERIC(12,2), currency, category_id, merchant_name, description, expense_date, receipt_s3_key, ocr_data JSONB, source MANUAL/RECEIPT/BANK_IMPORT, created_at, updated_at, deleted_at), budgets (budget_id, user_id, name, categories INT[], amount_limit, period WEEKLY/MONTHLY/YEARLY, alert_thresholds JSONB, created_at), groups (group_id, name, created_by, currency, created_at), group_members (group_id, user_id, joined_at), group_expenses (expense_id, group_id, paid_by_user_id, total_amount, split_type EQUAL/PERCENTAGE/EXACT, splits JSONB).

The mobile SQLite database mirrors a subset of these tables (user's own expenses, their groups, and budgets). Sync metadata is tracked in a sync_log table: entity_type, entity_id, operation INSERT/UPDATE/DELETE, timestamp, synced BOOLEAN. The Sync Service uses this log to push/pull changes since the last sync checkpoint. Redis stores running budget totals and group settlement balances for fast access.

API Design

POST /v1/expenses — Create an expense; body contains amount, currency, category_id, merchant_name, date, receipt_image (base64 or S3 key), group_id (optional), split_details (optional); returns expense_id
POST /v1/receipts/scan — Upload a receipt image for OCR; returns extracted fields (merchant, date, amount, line_items) with confidence scores for user confirmation
GET /v1/budgets/{budget_id}/status — Current budget status including spent_amount, remaining, percentage_used, projected_end_of_period_spend
GET /v1/groups/{group_id}/balances — Simplified debt balances among group members showing who owes whom and how much

Scaling & Bottlenecks

Receipt OCR processing is the primary bottleneck. At 208 images/sec, the GPU fleet requires 70 T4 GPUs (3 images/sec/GPU). Cost optimization: batch processing during off-peak hours (night) for non-urgent receipts, and priority processing for user-initiated scans (within 10 seconds). Model inference is optimized using TensorRT for 2x throughput improvement. Image storage (54TB/day raw) is managed by compressing receipts to JPEG quality 80 (reducing to ~800KB average) and tiering to S3 Infrequent Access after 90 days.

The offline-first sync model creates complexity during conflict resolution. Two devices editing the same expense simultaneously (e.g., user changes category on phone while editing amount on tablet) require field-level merge. The CRDT approach uses last-write-wins per field with vector clocks for causality tracking. In practice, conflicts are rare (<0.01% of syncs) because users typically use one device at a time. The Sync Service processes 20M sync requests/day, each involving a delta comparison of the client's sync log against the server state.

Key Trade-offs

Offline-first with CRDT sync over online-only: Offline capability is essential for expense tracking (logging expenses at restaurants, while traveling), but adds significant complexity in sync logic and conflict resolution — the rarity of actual conflicts makes this worthwhile
GPU-based OCR over third-party OCR API: In-house OCR models provide better accuracy on receipts (domain-specific training data) and avoid per-image API costs at scale, but require ML engineering investment — at 18M images/day, API costs would be $180K/day vs. $5K/day for GPU fleet
Running budget totals in Redis over query-time aggregation: Atomic increments in Redis provide instant budget status checks, but introduce a consistency risk (Redis total may drift from PostgreSQL reality) — a nightly reconciliation job recalculates all budgets from source data
Simplified debts algorithm over direct debt tracking: Minimizing the number of settlement transactions improves UX, but can create non-intuitive payment flows (Alice paying Charlie when she only shared meals with Bob) — offering both simplified and detailed views