Publish-Subscribe Pattern Explained: Decoupling Producers from Consumers at Scale
How pub/sub works — topics, subscriptions, message ordering, at-least-once delivery, and real-world patterns with Kafka, SNS, and Google Pub/Sub.
Publish-Subscribe Pattern
The publish-subscribe (pub/sub) pattern is a messaging paradigm where message producers (publishers) send messages to a topic without knowledge of which consumers (subscribers) will receive them, enabling loose coupling between components.
What It Really Means
In a direct communication model, Service A calls Service B's API. Service A must know Service B's address, API contract, and availability. If Service C also needs the same data, Service A must call both. If Service B is down, Service A must handle the failure. Each new consumer requires changes to the producer.
Pub/sub eliminates this coupling. Service A publishes an event — "OrderPlaced" — to a topic. It does not know or care who consumes it. Service B subscribes to that topic and processes new orders. Service C subscribes to the same topic and updates analytics. Service D subscribes and sends confirmation emails. The publisher's code never changes when consumers are added or removed.
This is fundamentally different from a point-to-point queue. In a queue, each message is consumed by exactly one consumer. In pub/sub, each message is delivered to every subscriber. This fan-out behavior is what makes pub/sub powerful for event-driven architectures where multiple systems need to react to the same event.
How It Works in Practice
Message Flow
Delivery Semantics
At-most-once: Message delivered zero or one times. Fast but lossy — acceptable for metrics or logging where occasional data loss is tolerable.
At-least-once: Message delivered one or more times. The most common guarantee. Subscribers must be idempotent because they may process the same message twice.
Exactly-once: Message delivered exactly one time. Extremely difficult in distributed systems. Kafka achieves this with transactional producers and consumer offset commits, but at a performance cost.
Ordering Guarantees
Real System: E-commerce Event Bus
Implementation
Publishing events (Python with Kafka):
Consuming events with idempotency:
AWS SNS + SQS fan-out pattern:
SNS handles fan-out (one message to many subscribers). SQS provides durability and retry (each subscriber processes at its own pace).
Trade-offs
Benefits:
- Publishers and subscribers are fully decoupled — add/remove consumers without changing producers
- Fan-out: one event triggers multiple independent processing paths
- Load leveling: subscribers process messages at their own rate
- Fault isolation: a slow subscriber does not block other subscribers
Costs:
- Message ordering is hard — most systems provide only partition-level ordering
- Debugging is harder — tracing a message through multiple subscribers requires distributed tracing
- Eventual consistency — subscribers process events asynchronously
- Duplicate messages are common — subscribers must be idempotent
When to use pub/sub:
- Multiple services need to react to the same event
- You want to add new consumers without modifying existing services
- Event-driven architectures with asynchronous processing
- Fan-out: notifications, analytics, audit logging
When to use point-to-point queues instead:
- Each message should be processed by exactly one consumer
- Work distribution (task queue pattern)
- Load balancing across worker instances
Common Misconceptions
- "Pub/sub guarantees message delivery" — Most pub/sub systems provide at-least-once delivery, not exactly-once. Messages can be lost (at-most-once) or duplicated (at-least-once). Build consumers to be idempotent.
- "Pub/sub and message queues are the same thing" — Queues deliver each message to one consumer. Pub/sub delivers each message to every subscriber. They solve different problems.
- "Kafka is pub/sub" — Kafka supports both patterns. With consumer groups, it acts as a queue (messages distributed across group members). Without consumer groups, it acts as pub/sub (every consumer gets every message).
- "Pub/sub means real-time" — There is always some latency: publishing, broker processing, delivery, subscriber processing. For Kafka, end-to-end latency is typically 5-50ms. For SNS/SQS, it can be 100ms-1s.
How This Appears in Interviews
- "Design a notification system" — Classic pub/sub: events published to topics, subscribers for push notifications, email, SMS. Discuss fan-out, delivery guarantees, and retry.
- "How do microservices communicate asynchronously?" — Pub/sub for event-driven communication. Compare with request/response (synchronous) and point-to-point queues.
- "How do you ensure a message is processed exactly once?" — Explain why exactly-once is hard, at-least-once with idempotent consumers is the practical solution, and how Kafka's transactional API approaches exactly-once.
- "Design a real-time analytics pipeline" — Kafka topics for ingestion, consumer groups for parallel processing, fan-out to multiple analytics consumers.
Related Concepts
- Observer Pattern — pub/sub is the distributed systems version of the observer pattern
- Transactional Outbox Pattern — reliably publish events from a database transaction
- Serverless Architecture — serverless functions as event subscribers
- Bulkhead Pattern — isolate subscriber failures from affecting other subscribers
- Compare: Kafka vs RabbitMQ
- System Design Interview Guide
- Algoroq Pricing — access all concept deep-dives
GO DEEPER
Learn from senior engineers in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.