Saga Pattern Explained: Managing Distributed Transactions Without Two-Phase Commit
Learn the saga pattern for distributed transactions — choreography vs orchestration, compensating actions, and real examples from e-commerce systems.
Saga Pattern
The saga pattern manages distributed transactions by breaking them into a sequence of local transactions, each followed by a compensating action that undoes the work if a later step fails — eliminating the need for a distributed lock or two-phase commit.
What It Really Means
In a monolith, a database transaction can atomically update orders, inventory, and payments in a single BEGIN...COMMIT. In a microservices architecture, the Order Service, Inventory Service, and Payment Service each own their own database. There is no single transaction that spans all three.
Two-phase commit (2PC) could coordinate them, but 2PC has serious problems: it locks resources across services during the protocol, a coordinator failure blocks everyone, and it does not work well across network boundaries or with heterogeneous systems.
The saga pattern, introduced by Hector Garcia-Molina and Kenneth Salem in 1987, takes a different approach. Instead of one big atomic transaction, it runs a sequence of smaller, local transactions. If step 3 of 5 fails, the saga runs compensating transactions for steps 2 and 1 to undo their effects. The end result is either all steps succeed, or all effects are rolled back — achieving consistency without distributed locks.
Sagas are used extensively in e-commerce (order processing), travel booking (flights + hotels + cars), financial services (money transfers), and any microservices system that needs cross-service data consistency.
How It Works in Practice
The Two Coordination Styles
Choreography (event-driven): Each service listens for events and performs its step, then emits the next event. No central coordinator.
Orchestration (central coordinator): A Saga Orchestrator tells each service what to do and handles failures.
Real-World: E-Commerce Order Processing
An order saga might have these steps:
| Step | Service | Action | Compensating Action |
|---|---|---|---|
| 1 | Order | Create order (PENDING) | Cancel order |
| 2 | Inventory | Reserve items | Release items |
| 3 | Payment | Authorize payment | Void authorization |
| 4 | Shipping | Schedule shipment | Cancel shipment |
| 5 | Order | Confirm order (CONFIRMED) | — (no compensation needed) |
If step 3 (payment) fails, the orchestrator runs the compensating actions for steps 2 and 1 in reverse order. The customer sees the order move to CANCELLED status.
Real-World: Travel Booking (Choreography)
A travel booking platform uses choreography:
- Flight Service books flight → emits FlightBooked
- Hotel Service hears FlightBooked, books hotel → emits HotelBooked
- Car Rental Service hears HotelBooked, books car → emits CarBooked
- Booking Service hears CarBooked, confirms trip → emits TripConfirmed
If car rental fails, Hotel Service hears CarBookingFailed and cancels the hotel. Flight Service hears HotelCancelled and cancels the flight. Each service only knows about the events it subscribes to.
Implementation
Trade-offs
Advantages
- No distributed locks: Each service uses only local transactions — no cross-service locking
- High availability: Services are not blocked waiting for each other (unlike 2PC)
- Service autonomy: Each service owns its data and transactions independently
- Works across heterogeneous systems: Services can use different databases, languages, and protocols
Disadvantages
- Complexity: Designing compensating actions for every step is non-trivial (how do you "undo" sending an email?)
- No isolation: Intermediate states are visible to other transactions. A concurrent query might see the order as PENDING with inventory reserved but payment not yet charged.
- Compensation can fail: If a compensating action fails, you need manual intervention or a dead letter queue
- Debugging difficulty: Tracing failures across multiple services and compensating actions requires good distributed tracing
Common Misconceptions
-
"Sagas provide ACID transactions" — Sagas provide eventual consistency, not atomicity or isolation. During execution, the system is in an intermediate state where some steps are committed and others are not. This is ACD (Atomicity, Consistency, Durability) without Isolation.
-
"Choreography is always better than orchestration" — Choreography is simpler for 2-3 services but becomes unmanageable for complex workflows. With 10 services and conditional branching, orchestration with a central coordinator is far easier to understand, test, and debug.
-
"Compensating actions perfectly undo the original action" — Some actions are not fully reversible. You cannot un-send a notification, un-ship a package, or un-charge a credit card (you can only issue a refund, which is different). Saga design must account for semantic reversibility.
-
"Sagas replace 2PC entirely" — For cases requiring strict atomicity (e.g., transferring money between two accounts that must always balance), sagas introduce temporary inconsistency. Some domains genuinely need 2PC or similar strong consistency guarantees.
How This Appears in Interviews
The saga pattern is a high-frequency interview topic for microservices design:
- "Design an order processing system with multiple services" — this is the canonical saga question. Walk through the steps, compensating actions, and whether you choose choreography or orchestration. See our system design interview guide.
- "How do you handle distributed transactions in microservices?" — compare sagas, 2PC, and eventual consistency. Explain why sagas are preferred for most microservice architectures.
- "What happens if a compensating action fails?" — discuss retry with idempotency, dead letter queues, and manual intervention dashboards.
- "Choreography vs orchestration — when do you pick each?" — choreography for simple linear workflows between 2-3 services; orchestration for complex workflows with branching, parallel steps, or many services.
Related Concepts
- Two-Phase Commit — The traditional alternative sagas replace
- Event Sourcing — A natural pairing with choreography-based sagas
- CQRS — Often combined with sagas in event-driven architectures
- Circuit Breaker — Protecting saga steps from failing downstream services
- Eventual Consistency — The consistency model sagas operate within
- Algoroq Pricing — Practice distributed transaction interview questions
GO DEEPER
Learn from senior engineers in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.