SYSTEM_DESIGN

System Design: Investment Portfolio Tracker

Design an investment portfolio tracking system handling real-time asset valuation, performance analytics, multi-asset class support, and tax lot tracking for millions of investor accounts.

16 min readUpdated Jan 15, 2025
system-designinvestment-portfoliofintechportfolio-management

Requirements

Functional Requirements:

  • Track holdings across multiple asset classes (stocks, bonds, ETFs, mutual funds, crypto, real estate)
  • Real-time portfolio valuation with live market data integration
  • Performance analytics: time-weighted return (TWR), money-weighted return (MWR/IRR), benchmark comparison
  • Tax lot tracking with cost basis methods (FIFO, LIFO, specific identification, average cost)
  • Aggregation of accounts across multiple brokerages via account linking
  • Dividend tracking, corporate action processing, and reinvestment (DRIP) handling

Non-Functional Requirements:

  • Support 5M user accounts with average 3 linked brokerage accounts each = 15M account connections
  • Portfolio valuation refresh within 5 seconds of market price change during trading hours
  • Historical performance data retained for 10+ years
  • 99.9% availability for portfolio viewing; 99.99% for data ingestion pipeline
  • Eventual consistency acceptable for analytics (up to 1-minute delay)

Scale Estimation

With 5M users and average 15 holdings per account × 3 accounts = 225M position rows to maintain. Market data: 10,000 unique securities across all users, each with price updates every second during market hours (6.5 hours/day) = 10K updates/sec. Portfolio revaluation: each price update affects an average of 5,000 user positions (popular stocks like AAPL held by many users) = 50M position revaluations/hour during market hours. Account sync: 15M brokerage connections polled daily for transactions = 15M API calls/day to brokerage aggregators. Performance calculation: daily return computation for 5M accounts = 5M calculations each market close.

High-Level Architecture

The portfolio tracker follows a CQRS (Command Query Responsibility Segregation) architecture. The write side handles data ingestion: brokerage account syncing, manual transaction entry, and corporate action processing. The read side serves portfolio views, performance analytics, and reporting. An event bus (Kafka) connects the two sides, ensuring that every transaction or price update is processed and reflected in the read models.

The ingestion pipeline connects to brokerage aggregators (Plaid, Yodlee, or direct broker APIs like Schwab, Fidelity) to pull holdings and transactions daily. A Transaction Reconciliation Service compares incoming brokerage data against the platform's internal records, detecting new trades, dividends, splits, and other corporate actions. Reconciled transactions update the Position Service (maintains current holdings and cost basis) and emit events consumed by the Analytics Service.

The Market Data Service subscribes to real-time price feeds (IEX Cloud, Polygon.io, or direct exchange feeds) and publishes price updates to Kafka. A Portfolio Valuation Service consumes price updates and recomputes portfolio values using a reverse index (security_id → list of users holding that security) to efficiently fan out price changes to affected portfolios. Recomputed portfolio values are written to Redis for real-time display and to TimescaleDB for historical time series.

Core Components

Position & Tax Lot Engine

The Position Engine maintains the authoritative record of what each user owns, including tax lot detail. Each acquisition (buy, transfer in, dividend reinvestment) creates a new tax lot with: acquisition_date, quantity, cost_basis, and lot_id. Each disposal (sell, transfer out) is matched against existing lots using the user's selected cost basis method. FIFO sells the oldest lot first; specific identification allows the user to choose which lot to sell (tax optimization). The engine handles complex scenarios: wash sale detection (IRS rule prohibiting loss recognition if substantially identical security is purchased within 30 days), stock splits (multiply quantity, divide cost basis), and spin-offs (allocate cost basis proportionally between parent and spin-off based on FMV). All lot mutations are recorded as immutable events for audit and recalculation.

Performance Analytics Service

The Analytics Service computes investment returns using two methodologies. Time-Weighted Return (TWR) eliminates the impact of cash flows (deposits/withdrawals) and measures pure investment performance — calculated by chaining daily returns (end_value / start_value - 1) across sub-periods separated by cash flows. Money-Weighted Return (MWR/IRR) accounts for the timing and size of cash flows, showing the actual investor experience — computed by solving for the discount rate that sets NPV of all cash flows to zero (Newton's method iterative solver). Benchmark comparison overlays portfolio returns against user-selected benchmarks (S&P 500, 60/40 portfolio, custom blends). All calculations run as batch jobs at market close, writing results to pre-computed tables for instant dashboard loading.

Account Aggregation Service

The Account Aggregation Service connects to external brokerages to import holdings and transactions. It uses a polling-based sync model: each linked account is synced once daily (or on-demand via user trigger). The sync process: authenticate with brokerage via OAuth (Plaid Link flow) → pull current holdings (symbol, quantity, market_value) → pull recent transactions (trades, dividends, fees) since last sync → reconcile against internal position records. Reconciliation handles discrepancies: if the brokerage shows 100 shares of AAPL but internal records show 95, the service identifies the missing transaction (likely a 5-share purchase not yet imported) and creates a reconciliation event. Failed syncs (expired tokens, brokerage API downtime) trigger retry with exponential backoff and user notification after 3 failures.

Database Design

The primary database is PostgreSQL. Core tables: positions (position_id, account_id, security_id, quantity NUMERIC(18,8), market_value, cost_basis, unrealized_gain_loss, last_priced_at), tax_lots (lot_id, position_id, acquisition_date, quantity, remaining_quantity, cost_per_share, cost_basis_method, wash_sale_adjusted BOOLEAN), transactions (transaction_id, account_id, security_id, type BUY/SELL/DIVIDEND/SPLIT/SPINOFF, quantity, price, amount, lot_id, transaction_date, settlement_date).

Real-time portfolio valuations are cached in Redis as sorted sets keyed by user_id with member=security_id and score=market_value, enabling instant portfolio total computation (ZRANGEBYSCORE). Historical portfolio values use TimescaleDB: portfolio_history (user_id, date, total_value, total_cost_basis, total_gain_loss, daily_return). A reverse index table security_holders (security_id, user_id, position_id) in Redis enables efficient fan-out of price updates to affected portfolios.

API Design

  • GET /v1/portfolio?account_id={id} — Get current portfolio holdings with real-time valuations, gain/loss, and allocation percentages
  • GET /v1/portfolio/performance?period=1Y&benchmark=SPX — Portfolio performance over specified period with TWR, MWR, and benchmark comparison chart data
  • GET /v1/portfolio/tax-lots?security_id={id} — View tax lots for a specific holding with cost basis, holding period, and unrealized gain/loss per lot
  • POST /v1/accounts/link — Initiate brokerage account linking; returns a Plaid Link token for the frontend to open the connection flow

Scaling & Bottlenecks

The portfolio revaluation fan-out is the primary scaling challenge. When AAPL's price updates (held by 500K users), 500K portfolio valuations must be recomputed. A naive approach would be 500K database reads + writes per price tick. The solution uses a tiered update strategy: Tier 1 (real-time, <5s) updates the Redis cache for users currently viewing their portfolio (tracked via WebSocket connection presence); Tier 2 (near-real-time, <60s) batch-updates all affected portfolios in PostgreSQL via a bulk UPDATE with a JOIN on the reverse index. This reduces the update from 500K individual queries to a single bulk operation.

Account aggregation at 15M daily syncs requires careful rate management against brokerage APIs. Syncs are distributed throughout the day using a time-slotted scheduler (accounts are assigned sync slots based on hash of account_id), preventing thundering herd on brokerage APIs at market open. Each aggregator connection runs through a rate limiter (token bucket per brokerage) respecting the provider's API limits.

Key Trade-offs

  • CQRS over shared read-write model: Separating the write path (transaction ingestion) from the read path (portfolio views) allows independent scaling and optimization — the trade-off is eventual consistency between write and read models (up to 1 minute)
  • Daily batch performance calculation over real-time: Computing TWR/MWR requires complete daily return series — batch calculation at market close ensures accuracy, but intraday returns shown to users are approximations
  • Tax lot immutability over mutable records: Storing every lot mutation as an event enables perfect recalculation and audit, but increases storage and query complexity — materialized current-state views bridge the gap
  • Polling-based account sync over webhook: Most brokerages don't support webhooks for account changes, so polling is the practical choice — the trade-off is a delay of up to 24 hours for reflecting external trades

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.