SYSTEM_DESIGN
System Design: Fraud Detection System
Design a real-time fraud detection system for financial transactions using ML scoring, rule engines, behavioral analytics, and device fingerprinting to block fraudulent payments while minimizing false positives.
Requirements
Functional Requirements:
- Score every payment transaction in real-time for fraud risk before authorization
- Apply rule-based policies (velocity checks, amount thresholds, geo-fencing) configurable by risk analysts
- Behavioral analytics comparing current transaction against user's historical patterns
- Device fingerprinting and session risk scoring to detect account takeover
- Case management system for manual review of flagged transactions
- Feedback loop: fraud investigators mark transactions as confirmed fraud or false positive to retrain models
Non-Functional Requirements:
- Score 15,000 transactions/sec with p99 latency under 100ms (synchronous in the payment path)
- False positive rate below 0.5% to minimize legitimate customer friction
- Fraud detection rate above 95% for known fraud patterns
- Model update deployment within 4 hours of new fraud pattern detection
- 99.99% availability — fraud service downtime means either blocking all payments or letting fraud through
Scale Estimation
At 15,000 TPS, that is 1.3B transactions/day requiring fraud scoring. Each scoring request involves: feature extraction (20-30 features computed in real-time), model inference (ensemble of 5 models), rule engine evaluation (200+ rules), and decision logging. Feature extraction queries user history (last 30 days of transactions = ~100 rows average per user), device fingerprint database (~500M device records), and IP reputation database. Model inference on a CPU-optimized model takes ~15ms; rule engine evaluation ~10ms; total overhead budget is 100ms. Feature store must serve 15K lookups/sec with sub-10ms p99 latency. Decision logs produce 1.3B records/day = ~2TB/day at 1.5KB per record.
High-Level Architecture
The fraud detection system sits inline in the payment authorization path — the payment gateway calls the Fraud Service synchronously before sending the authorization request to the card network. The architecture has two planes: the Real-Time Scoring Plane (handles live transaction scoring) and the Offline Training Plane (handles model training, feature engineering, and analytics).
The Real-Time Scoring Plane receives a transaction scoring request and orchestrates three parallel evaluations: (1) the ML Scoring Service runs the transaction features through an ensemble model (gradient-boosted trees via XGBoost for tabular features + a neural network for sequence features like transaction history patterns); (2) the Rule Engine evaluates deterministic rules configured by fraud analysts (e.g., "block if >5 transactions in 1 minute from same card" or "flag if transaction country differs from cardholder country"); (3) the Device Intelligence Service checks the device fingerprint against known fraud device clusters. The results are combined by a Decision Aggregator that produces a final decision: APPROVE, DECLINE, or REVIEW (sent to manual queue).
The Offline Training Plane runs on Spark/Databricks. Daily batch jobs compute aggregate features (user spending patterns, merchant fraud rates, card velocity profiles) and write them to the Feature Store (Redis cluster for online serving, Delta Lake for offline training). Model retraining runs weekly on labeled data (confirmed fraud + confirmed legitimate transactions) with continuous evaluation against a holdout set. Challenger models are deployed via A/B testing with shadow scoring before full rollout.
Core Components
Feature Store & Real-Time Feature Engine
The Feature Store is the backbone of the scoring system, serving pre-computed and real-time features at sub-10ms latency. Pre-computed features (user's average transaction amount, preferred merchants, typical transaction times) are calculated by daily Spark batch jobs and loaded into a Redis cluster keyed by user_id. Real-time features (transaction count in last 5 minutes, cumulative amount in last hour) are maintained by a Flink streaming job consuming transaction events from Kafka. Flink uses sliding window aggregations and writes results to Redis with TTL-based expiration. The Feature Engine assembles a 30-dimension feature vector for each transaction by querying Redis in a single pipelined MGET call.
ML Scoring Service
The scoring service runs an ensemble of models: (1) an XGBoost model trained on tabular features (amount, merchant category, time of day, velocity metrics) providing a calibrated fraud probability; (2) an LSTM neural network processing the user's last 50 transactions as a sequence to detect anomalous behavior shifts; (3) a graph neural network analyzing the transaction network (sender-receiver relationships) to detect fraud rings. Each model runs independently on dedicated CPU-optimized pods (ONNX Runtime for XGBoost, TensorFlow Serving for neural models). The ensemble combines scores using a logistic regression meta-model. Model serving infrastructure uses Kubernetes with autoscaling based on request queue depth, maintaining 3x headroom for traffic spikes.
Rule Engine
The Rule Engine evaluates deterministic fraud rules written in a DSL (domain-specific language) by fraud analysts via a web-based rule editor. Rules are compiled into an optimized decision tree at deployment time for fast evaluation. Examples: velocity rules ("decline if card used at >3 different merchants in 10 minutes"), amount rules ("flag if transaction >10x user's average"), geo rules ("decline if transaction in high-risk country and card was used domestically 30 minutes ago"). Rules support A/B testing — new rules run in shadow mode (score but don't block) for 48 hours before enforcement. The rule engine evaluates 200+ rules in <10ms using short-circuit evaluation on the compiled decision tree.
Database Design
The Feature Store uses Redis Cluster (30 nodes) with two key patterns: user features stored as Redis hashes user:{user_id}:features containing 20+ pre-computed fields, and sliding window counters stored as sorted sets velocity:{user_id}:{window} with timestamp scores for real-time aggregation. TTL on velocity keys is set to 24 hours. The decision log uses Kafka as the primary store with a ClickHouse consumer for analytical queries. ClickHouse schema: transaction_id, user_id, merchant_id, amount, fraud_score, model_scores (Array Float32), rules_triggered (Array String), decision, latency_ms, timestamp. Partitioned by day with a 2-year retention policy.
Labeled fraud cases are stored in PostgreSQL: case_id, transaction_id, reported_by (CUSTOMER/INTERNAL/CHARGEBACK), fraud_type (CARD_STOLEN/ACCOUNT_TAKEOVER/FRIENDLY_FRAUD), investigation_status, resolved_at, analyst_id. This labeled dataset is the training data source for model retraining.
API Design
POST /v1/fraud/score— Synchronous scoring endpoint called by payment gateway; body contains transaction details (amount, currency, merchant, card_token, device_fingerprint, ip_address); returns decision (APPROVE/DECLINE/REVIEW), fraud_score (0-1000), triggered_rules array, latency_msPOST /v1/fraud/feedback— Submit fraud confirmation or false positive; body contains transaction_id, label (FRAUD/LEGITIMATE), fraud_type, reporterGET /v1/fraud/cases?status=open&analyst={id}&page={n}— List fraud cases for manual review queue with filteringPUT /v1/fraud/rules/{rule_id}— Create or update a fraud rule; body contains rule DSL expression, action (DECLINE/FLAG/SHADOW), priority
Scaling & Bottlenecks
The 100ms latency budget is the binding constraint. Network calls to Redis for feature retrieval consume 5-10ms; model inference 15-25ms; rule evaluation 5-10ms; overhead (serialization, logging) 10-15ms. To stay within budget, all three scoring components (ML, rules, device) run in parallel with a 80ms timeout — if any component exceeds the timeout, its result is excluded and the remaining components make the decision. The Feature Store Redis cluster is the most latency-sensitive dependency — a Redis node failure increases p99 latency by 20ms due to cluster redirections. This is mitigated with read replicas and client-side caching of hot keys (top 10K most active users).
Model serving scalability is managed by horizontal pod autoscaling on Kubernetes. XGBoost inference on CPU is efficient (~5ms per prediction), but the LSTM sequence model requires more compute (~15ms). To reduce latency, the LSTM model input is pre-computed: the user's recent transaction embedding is updated incrementally after each transaction rather than recomputing from the full sequence at scoring time.
Key Trade-offs
- Inline synchronous scoring over async post-authorization review: Scoring before authorization prevents fraud in real-time but adds latency to every legitimate payment — the 100ms budget minimizes this impact while still catching fraud before money moves
- Ensemble of specialized models over single large model: Multiple models capture different fraud signals (tabular anomalies, behavioral sequences, network patterns) but increase inference latency and operational complexity — the parallel execution and timeout strategy bounds the impact
- Shadow mode for new rules before enforcement: Prevents false-positive spikes from untested rules, but delays the response to new fraud patterns by 48 hours — critical rules can bypass shadow mode with manager approval
- Pre-computed features over real-time computation: Batch-computed features are cheaper and allow complex aggregations, but can be up to 24 hours stale — supplemented with real-time Flink-computed features for time-sensitive signals like velocity
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.