Software Architecture Patterns for Senior Engineers

Comprehensive guide to software architecture patterns — monoliths, microservices, event-driven, CQRS, DDD, and migration strategies with real case studies.

architecturesystem-designmicroservicesdesign-patterns

Software Architecture Patterns for Senior Engineers

Architecture is the set of decisions that are expensive to change. Choosing between a monolith and microservices, between synchronous and asynchronous communication, between a shared database and per-service databases — these decisions shape your system's trajectory for years. Get them right, and your team ships quickly with confidence. Get them wrong, and you spend years paying down technical debt.

This guide is written for senior and staff engineers who need to make architecture decisions, defend them in design reviews, and communicate trade-offs to both technical and non-technical stakeholders. It covers the major architectural patterns, when to use each, real company case studies, and practical migration strategies. If you are preparing for a system design interview, this guide covers the architectural vocabulary and reasoning that interviewers expect at the senior level.

Table of Contents

  1. Monolithic Architecture
  2. Microservices Architecture
  3. Event-Driven Architecture
  4. CQRS — Command Query Responsibility Segregation
  5. Event Sourcing
  6. Hexagonal Architecture and Clean Architecture
  7. Domain-Driven Design
  8. Service Mesh
  9. API Gateway Patterns
  10. Backend for Frontend Pattern
  11. Strangler Fig Migration Pattern
  12. Sidecar Pattern
  13. How to Study This Material
  14. Related Resources

Monolithic Architecture

A monolith is a single deployable unit that contains all application logic. The user interface, business logic, and data access layer are built and deployed as one artifact — a single JAR, a single Docker image, a single binary.

Structure of a Well-Designed Monolith

A monolith does not have to be a "big ball of mud." A well-structured monolith uses internal module boundaries to separate concerns:

Each module owns its own API layer, business logic, and data access. Modules communicate through well-defined internal interfaces, not by reaching into each other's database tables.

When Monoliths Are the Right Choice

Startups and early-stage products: You do not know your domain boundaries yet. A monolith lets you iterate quickly, refactor across module boundaries easily, and avoid the operational overhead of managing multiple services.

Small teams (under 10 engineers): The coordination overhead of microservices (API versioning, distributed tracing, service discovery, deployment pipelines per service) outweighs the benefits when the team is small.

Simple domains: If your application has a straightforward domain with well-understood requirements and moderate scale, a monolith is simpler to build, test, deploy, and debug.

When Monoliths Become Painful

Deployment coupling: A one-line change in the payment module requires deploying the entire application, including the shipping, user, and order modules. This increases deployment risk and reduces deployment frequency.

Scaling limitations: You cannot scale individual modules independently. If the search module needs 10x the compute of the user profile module, you must scale the entire monolith.

Team coupling: As the team grows beyond 10-15 engineers, merge conflicts increase, CI pipelines slow down, and teams step on each other's code. Conway's Law starts to bite.

Technology lock-in: The entire application must use the same language, framework, and database. You cannot use Python for ML, Go for high-performance services, and Node.js for real-time features.

Case Study: Shopify's Modular Monolith

Shopify famously chose to stay with a monolith as they grew to one of the largest e-commerce platforms in the world. But it is not a traditional monolith — it is a modular monolith with strictly enforced module boundaries.

Shopify uses a tool called Packwerk to enforce dependency rules between modules. If the Orders module tries to import code from the Payments module's internal implementation (rather than its public API), the build fails. This gives them the organizational benefits of microservices (team ownership, clear boundaries) without the operational complexity.

Key decisions:

  • Modules communicate through well-defined interfaces, not database joins.
  • Each module has an explicit public API. Internal implementation details are private.
  • Dependency cycles between modules are forbidden.
  • The entire application still deploys as one unit, but individual modules can be tested and developed independently.

The Modular Monolith: Best of Both Worlds?

The modular monolith is increasingly recognized as a middle ground. It enforces the boundaries of microservices within a single deployable unit. You get:

  • Fast, in-process communication between modules (no network overhead)
  • Single deployment pipeline (no distributed systems complexity)
  • Clear module boundaries (enforced by linting tools or language features)
  • Easy refactoring across module boundaries (same codebase)
  • Straightforward debugging and tracing (single process)

The risk is that module boundaries erode over time without strict enforcement. Without tooling like Packwerk or ArchUnit, developers take shortcuts and create hidden dependencies.


Microservices Architecture

Microservices decompose an application into small, independently deployable services. Each service owns its data, runs in its own process, and communicates with other services over the network (HTTP, gRPC, or messaging).

Defining Characteristics

  1. Single responsibility: Each service handles one bounded context or business capability.
  2. Independent deployment: Services are deployed independently. Changing the payment service does not require redeploying the order service.
  3. Decentralized data management: Each service owns its database. No shared database.
  4. Polyglot technology: Each service can use the language, framework, and database best suited to its needs.
  5. Failure isolation: A failure in one service does not cascade to other services (with proper circuit breaking).

Service Boundaries: Getting the Decomposition Right

The hardest part of microservices is finding the right service boundaries. Decompose too coarsely, and you have a distributed monolith. Decompose too finely, and you have a distributed nightmare with excessive network calls and complex transaction management.

Guidelines for service boundaries:

Align with business capabilities: An e-commerce platform might have services for Users, Catalog, Orders, Payments, Shipping, Inventory, and Notifications. Each maps to a distinct business capability with its own data and lifecycle.

Use bounded contexts from DDD: If two concepts have different meanings in different parts of the business (e.g., "Customer" means different things to Sales and Support), they belong in different services.

Minimize synchronous dependencies: If Service A cannot function without a synchronous call to Service B, those services are too tightly coupled. Consider merging them or introducing asynchronous communication.

One team per service: A service should be owned by a single team that can develop, test, deploy, and operate it independently. If a service requires coordination between multiple teams, it is too large.

Communication Patterns

Synchronous (HTTP/gRPC):

  • Request-response pattern. The caller waits for a response.
  • Use for: queries that need an immediate answer, user-facing API calls.
  • Risk: cascading failures if downstream services are slow or unavailable.

Asynchronous (Events/Messages):

  • Fire-and-forget pattern. The producer publishes an event and moves on.
  • Use for: state changes that other services need to know about, long-running operations.
  • Benefit: temporal decoupling — the producer and consumer do not need to be available simultaneously.

Best practice: Default to asynchronous communication. Use synchronous communication only when you genuinely need an immediate response (e.g., user authentication, real-time pricing).

The Distributed Monolith Anti-Pattern

A distributed monolith has all the complexity of microservices with none of the benefits. Signs you have a distributed monolith:

  • Services must be deployed together in a specific order.
  • A change in Service A requires a change in Service B.
  • Services share a database.
  • Services communicate synchronously for every operation.
  • There is a "god service" that orchestrates everything.

The distributed monolith is worse than a regular monolith because you have network latency, partial failures, and operational complexity with no independent deployability.

Case Study: Netflix's Microservices Journey

Netflix's migration from a monolithic Java application to microservices (2009-2012) is the most cited microservices case study. Key lessons:

  • The migration took 3 years. Netflix migrated incrementally using the strangler fig pattern, routing traffic to new microservices one endpoint at a time.
  • They built extensive infrastructure: service discovery (Eureka), load balancing (Ribbon), circuit breaking (Hystrix), API gateway (Zuul), distributed tracing, and chaos engineering (Chaos Monkey). Without this infrastructure, microservices would have been unmanageable.
  • They invested heavily in observability: with over 1,000 microservices, you cannot debug by reading logs. Netflix built Atlas for metrics, Edgar for distributed tracing, and extensive dashboarding.
  • Culture change was as important as technical change: each team owns their service end-to-end, including on-call. "You build it, you run it."

For a deep dive into Netflix's architecture, see our How Netflix Scales case study.

Case Study: Amazon's Two-Pizza Teams

Amazon decomposed their monolith into hundreds of services in the early 2000s, driven by Jeff Bezos's famous API mandate: every team must expose their functionality through APIs, with no other form of inter-team communication. This mandate forced service boundaries that aligned with team boundaries.

The "two-pizza team" rule (each team small enough to be fed by two pizzas) ensured services were small enough for a single team to own. This organizational structure drove the technical architecture — a direct example of Conway's Law.

Trade-offs Summary

AspectMonolithMicroservices
Development speed (early)FasterSlower (infrastructure overhead)
Development speed (late)Slower (coupling)Faster (independence)
Deployment riskHigher (all-or-nothing)Lower (per-service)
Operational complexityLowerMuch higher
DebuggingEasier (single process)Harder (distributed tracing needed)
Team independenceLowerHigher
Technology flexibilityOne stackPolyglot
Data consistencyEasy (ACID transactions)Hard (eventual consistency, sagas)

For most organizations, the right path is: start with a monolith, enforce module boundaries, and extract services only when the monolith becomes a bottleneck for team productivity or scaling.


Event-Driven Architecture

Event-driven architecture (EDA) is a paradigm where the flow of the program is determined by events — significant changes in state. Services communicate by producing and consuming events rather than calling each other directly.

Core Concepts

Event: A record of something that happened. Events are facts — they are immutable and past tense. Examples: OrderPlaced, PaymentProcessed, UserRegistered.

Event Producer: The service that detects a state change and publishes an event.

Event Consumer: A service that reacts to events. A single event can have many consumers.

Event Channel: The infrastructure that transports events from producers to consumers (Kafka, RabbitMQ, Amazon SNS/SQS).

Event Types

Domain Events: Represent something meaningful in the business domain. OrderShipped, InvoiceGenerated, SubscriptionCancelled.

Integration Events: Events published for consumption by other services. They contain only the data needed by consumers and serve as the public API of a service.

Event Notifications: Thin events that notify consumers that something happened but contain minimal data. The consumer must call back to get details. Example: { "type": "order_placed", "order_id": "abc123" }.

Event-Carried State Transfer: Fat events that contain all the data a consumer needs, eliminating callback queries. Example: { "type": "order_placed", "order_id": "abc123", "customer_id": "456", "items": [...], "total": 99.50 }.

Patterns

Event Notification: Producers publish notifications; consumers react. Low coupling, but consumers may need to call back for details.

Event-Carried State Transfer: Events carry full state, so consumers do not need callbacks. Reduces coupling further, but events are larger and data duplication increases.

Event Sourcing: Instead of storing current state, store the sequence of events that led to the current state. The current state is derived by replaying events. Covered in detail in the next section.

CQRS with Events: Separate the write model (command side) from the read model (query side). Events propagate changes from the command side to the query side. Covered in detail in the CQRS section.

When to Use Event-Driven Architecture

Good fit:

  • Multiple consumers need to react to the same event (fan-out).
  • Temporal decoupling is needed (producer and consumer do not need to be available simultaneously).
  • Eventual consistency is acceptable.
  • Auditing and replay are requirements.
  • The system has complex workflows with many steps.

Poor fit:

  • Strong consistency is required across multiple services.
  • Simple request-response patterns dominate.
  • The team lacks experience with asynchronous debugging and eventual consistency.
  • The system is small enough that direct calls are simpler.

Real-World: Uber's Event-Driven Architecture

Uber processes billions of events per day. When a rider requests a ride, the following events flow through the system:

  1. RideRequested — triggers driver matching, ETA calculation, surge pricing.
  2. DriverAssigned — triggers rider notification, driver navigation.
  3. RideStarted — triggers billing initialization, real-time tracking.
  4. RideCompleted — triggers payment processing, driver payout, rating prompt.
  5. PaymentProcessed — triggers receipt generation, accounting.

Each event is consumed by multiple services independently. The ride service does not know or care about the billing service — it just publishes events. This decoupling allows teams to work independently and deploy at their own cadence.

Uber uses Apache Kafka as its central event bus, processing over 1 trillion messages per day across its clusters.

For more on message queue selection, see our Kafka vs RabbitMQ comparison.


CQRS -- Command Query Responsibility Segregation

CQRS separates the model for reading data (queries) from the model for writing data (commands). Instead of a single model that handles both reads and writes, you have two models optimized for their respective operations.

Why CQRS?

In many systems, reads and writes have fundamentally different characteristics:

  • Reads are far more frequent than writes (often 100:1 or 1000:1).
  • Reads often need denormalized, pre-joined data (for display).
  • Writes need normalized, validated data (for consistency).
  • Reads may need to be served from different geographic locations.
  • Writes may need strong consistency, while reads can tolerate staleness.

With a single model, you compromise: the model is either optimized for reads (denormalized, fast queries) or writes (normalized, safe mutations), but not both.

How CQRS Works

  1. Command side: Handles mutations. Validates business rules, writes to the primary database, and publishes events.
  2. Event propagation: Changes from the write side are propagated to the read side via events (synchronously or asynchronously).
  3. Query side: Handles reads. Serves data from optimized read stores (denormalized tables, search indices, caches).

When to Use CQRS

Good fit:

  • Read and write workloads have vastly different scaling requirements.
  • The read model needs to be in a different shape than the write model (e.g., materialized views, search indices).
  • You need to serve reads from multiple specialized stores (SQL for transactional queries, Elasticsearch for search, Redis for real-time dashboards).
  • Eventual consistency between reads and writes is acceptable.

Poor fit:

  • Simple CRUD applications where reads and writes are similar.
  • Small-scale applications where the complexity is not justified.
  • When strong read-after-write consistency is required everywhere.

Case Study: Stack Overflow's CQRS

Stack Overflow uses a CQRS-inspired architecture to serve 1.7 billion page views per month from a remarkably small infrastructure (9 web servers). The write path uses normalized SQL Server tables. The read path uses heavily denormalized cached data structures in Redis. Tags, user reputation, question scores, and other frequently read data are pre-computed and cached, avoiding expensive database joins on every page view.


Event Sourcing

Event sourcing stores the state of a system as a sequence of events rather than the current state. Instead of updating a row in a database, you append a new event that describes what happened. The current state is derived by replaying all events from the beginning.

How Event Sourcing Works

Traditional approach (state-based):

Event sourcing approach:

Benefits of Event Sourcing

  1. Complete audit trail: Every state change is recorded. You can answer questions like "what was the balance on April 22?" by replaying events up to that date.
  2. Temporal queries: Reconstruct the state at any point in time.
  3. Debugging: When something goes wrong, replay events to see exactly what happened.
  4. Event-driven integration: Events are first-class citizens — other services can consume them for their own purposes.
  5. No data loss: You never delete or overwrite data. Every change is an append.

Challenges of Event Sourcing

  1. Event schema evolution: As your system evolves, event schemas change. You need a strategy for versioning events (upcasting, lazy migration, or dual writes).
  2. Eventual consistency: The read model is typically updated asynchronously from the event store, so queries may return stale data.
  3. Performance: Replaying thousands of events to rebuild state is slow. Mitigate with snapshots — periodically save the current state so you only need to replay events since the last snapshot.
  4. Complexity: Event sourcing is a significant paradigm shift from CRUD. The learning curve is steep, and debugging an event-sourced system requires different mental models.
  5. Storage growth: Events are never deleted. For high-volume systems, storage grows continuously. Archiving and compaction strategies are needed.

Snapshots

To avoid replaying all events from the beginning, periodically create a snapshot of the current state:

  1. Replay events up to event N and compute the current state.
  2. Store the snapshot: { account_id: 123, balance: 750, snapshot_at_event: 4 }.
  3. To rebuild state, load the latest snapshot and replay only events after event N.

Real-World: Event Sourcing in Financial Systems

Event sourcing is natural for financial systems because regulations require a complete audit trail. If you store balance = 750, you cannot explain how you got there. If you store every transaction as an event, you have a verifiable, append-only ledger.

Stripe uses event sourcing for its payment processing pipeline. Every payment state transition (created, authorized, captured, refunded) is an event. The current state of a payment is derived from the event sequence, and the events serve as the audit trail for compliance.

When to Use Event Sourcing

Good fit: Financial systems, audit-heavy domains, complex business workflows, systems that need temporal queries, integration with event-driven architecture.

Poor fit: Simple CRUD applications, systems where storage growth is a concern, teams without experience with event-driven patterns, domains where "current state" is the only thing that matters.


Hexagonal Architecture and Clean Architecture

Hexagonal architecture (also called ports and adapters) and clean architecture are related patterns that isolate business logic from infrastructure concerns. They make the application testable, flexible, and independent of frameworks, databases, and UI.

Hexagonal Architecture (Ports and Adapters)

The core idea: the business logic (domain) is at the center. It defines "ports" — interfaces that describe how the outside world interacts with it. "Adapters" implement those ports for specific technologies.

Inbound ports: Define how the application receives requests (HTTP, gRPC, CLI, message consumer).

Outbound ports: Define how the application interacts with external systems (database, cache, external API, message publisher).

The key rule: dependencies always point inward. The domain does not know about HTTP, PostgreSQL, or Kafka. It only knows about its ports (interfaces). Adapters implement those ports and handle the technology-specific details.

Clean Architecture

Robert C. Martin's clean architecture formalized this into concentric layers:

  1. Entities (innermost): Business objects and rules that are true across the entire enterprise.
  2. Use Cases: Application-specific business rules. Orchestrate entities to fulfill a specific user goal.
  3. Interface Adapters: Convert data between the use case format and the format needed by external agencies (controllers, presenters, gateways).
  4. Frameworks & Drivers (outermost): Web frameworks, databases, UI, external APIs.

The dependency rule: Source code dependencies can only point inward. Nothing in an inner circle can know about anything in an outer circle.

Code Example

python

Trade-offs

Benefits:

  • Business logic is testable without any infrastructure (mock the ports).
  • Swap databases, frameworks, or external services without changing business logic.
  • Clear separation of concerns and dependency direction.

Costs:

  • More code and more abstractions (interfaces, adapters, mappers).
  • Can feel like over-engineering for simple CRUD applications.
  • Requires discipline to maintain the dependency rule as the codebase grows.

When to Use

Use hexagonal/clean architecture when your domain logic is complex and is the core value of your application. If your application is mostly CRUD with little business logic, the overhead of ports and adapters is not justified.


Domain-Driven Design

Domain-Driven Design (DDD) is a set of principles for building software that closely models the real-world domain. It is not an architecture pattern itself but a design philosophy that strongly influences architecture decisions, especially service boundaries in microservices.

Strategic DDD: Bounded Contexts

The most architecturally significant concept in DDD is the bounded context. A bounded context is a boundary within which a particular model is defined and applicable.

Example: The word "Product" means different things in different contexts:

  • In the Catalog context: name, description, images, categories.
  • In the Inventory context: SKU, warehouse location, quantity.
  • In the Shipping context: weight, dimensions, packaging requirements.

Each bounded context has its own model of "Product" optimized for its needs. Trying to create a single "Product" model that serves all contexts leads to a bloated, confused model.

Bounded contexts map directly to microservice boundaries. Each microservice owns one bounded context and its associated model and data.

Context Mapping

Bounded contexts interact with each other through explicit integration patterns:

  • Shared Kernel: Two contexts share a subset of the model. Changes must be coordinated. Use sparingly.
  • Customer-Supplier: One context (supplier) provides a service to another (customer). The customer can request changes.
  • Conformist: The downstream context conforms to the upstream context's model. No negotiation.
  • Anti-Corruption Layer (ACL): The downstream context translates the upstream model into its own terms. Protects the downstream model from upstream changes.
  • Published Language: A shared, well-documented data format (e.g., Protocol Buffers, JSON Schema) for inter-context communication.

Tactical DDD: Building Blocks

Entities: Objects with a unique identity that persists over time. Example: a User is identified by user_id regardless of name or email changes.

Value Objects: Objects without identity, defined by their attributes. Example: a Money object with amount and currency. Two Money(100, "USD") instances are interchangeable.

Aggregates: A cluster of entities and value objects that are treated as a single unit for data changes. The aggregate root is the entry point. Example: an Order aggregate contains OrderLine items. You can only modify order lines through the Order aggregate root.

Domain Events: Events that represent something significant that happened in the domain. OrderPlaced, PaymentFailed, InventoryReserved.

Repositories: Interfaces for persisting and retrieving aggregates. The domain defines the interface; the infrastructure provides the implementation.

When DDD Adds Value

DDD is most valuable when:

  • The domain is complex and evolving.
  • You are building microservices and need to define service boundaries.
  • Multiple teams work on the same product and need clear ownership boundaries.
  • The business logic is the competitive advantage (not the technology).

DDD is overkill when:

  • The domain is simple (CRUD operations with little business logic).
  • The project is short-lived or a prototype.
  • The team does not have access to domain experts.

Service Mesh

A service mesh is a dedicated infrastructure layer that handles service-to-service communication. It provides traffic management, security (mTLS), observability, and resilience (retries, circuit breaking) without requiring changes to application code.

How a Service Mesh Works

A service mesh deploys a proxy (sidecar) alongside each service instance. All inbound and outbound traffic flows through the proxy. The proxies are configured by a central control plane.

Key Capabilities

Mutual TLS (mTLS): Every service-to-service call is encrypted and authenticated. The mesh handles certificate issuance, rotation, and validation automatically.

Traffic management: Canary deployments (route 5% of traffic to v2), traffic splitting, retries with exponential backoff, timeouts.

Observability: The proxies automatically collect metrics (latency, error rates, request volume), distributed traces, and access logs for every service-to-service call.

Resilience: Circuit breaking, rate limiting, and outlier detection without application code changes.

Istio vs. Linkerd

FeatureIstioLinkerd
ProxyEnvoy (C++)linkerd2-proxy (Rust)
Resource overheadHigher (~50MB per sidecar)Lower (~20MB per sidecar)
ComplexityHigh (many features, many CRDs)Lower (focused feature set)
mTLSYes (complex configuration)Yes (on by default)
Multi-clusterYes (complex setup)Yes (simpler setup)
Learning curveSteepModerate

When You Do Not Need a Service Mesh

A service mesh adds significant operational complexity. You probably do not need one if:

  • You have fewer than 10-15 services.
  • Your team does not have Kubernetes expertise.
  • You can implement mTLS, retries, and observability at the application level.
  • You are not doing canary or traffic-splitting deployments.

For a detailed comparison, see our Service Mesh Evaluation: Istio, Linkerd, and When You Don't Need One.


API Gateway Patterns

An API gateway is a single entry point for all client requests. It handles cross-cutting concerns (authentication, rate limiting, request routing) so that individual services do not have to.

Core Responsibilities

  1. Request routing: Route requests to the appropriate backend service based on URL path, headers, or other criteria.
  2. Authentication and authorization: Validate tokens (JWT, OAuth), enforce API keys, check permissions.
  3. Rate limiting: Protect services from excessive traffic.
  4. Response aggregation: Combine responses from multiple services into a single response.
  5. Protocol translation: Accept REST from clients, forward as gRPC to services.
  6. Caching: Cache responses for frequently requested, rarely changing data.
  7. Logging and monitoring: Log all requests for observability.

API Gateway vs. Load Balancer

A load balancer distributes traffic across instances of the same service (Layer 4 or Layer 7). An API gateway routes traffic to different services based on the request content and handles application-level concerns.

In practice, you use both: the API gateway handles request routing and cross-cutting concerns, and a load balancer distributes traffic within each service.

API Gateway Anti-Patterns

The "smart" gateway: Putting business logic in the gateway creates a bottleneck and tight coupling. The gateway should only handle cross-cutting concerns.

Single point of failure: The gateway is on the critical path for every request. It must be highly available, horizontally scalable, and independently deployable.

Gateway as a team bottleneck: If every service change requires a gateway configuration change, and the gateway is owned by a single team, that team becomes a bottleneck.

Real-World Implementations

  • Amazon API Gateway: Fully managed, serverless. Good for AWS-native architectures.
  • Kong: Open-source, plugin-based. Good for teams that need customization.
  • Netflix Zuul: Custom-built for Netflix's scale. Handles dynamic routing, load shedding, and security.
  • Envoy: Often used as both a sidecar proxy and an API gateway (edge proxy).

Backend for Frontend Pattern

The Backend for Frontend (BFF) pattern creates a separate backend service for each type of frontend (web, mobile, smart TV, etc.). Each BFF is optimized for its frontend's specific needs.

Why BFF?

Different frontends have different requirements:

  • Mobile: Low bandwidth, needs compact responses, tolerates aggressive caching.
  • Web: Full-featured, needs rich responses, supports complex interactions.
  • Smart TV: Very limited UI, needs pre-formatted responses, slow hardware.

A single API that serves all frontends either becomes overly complex or compromises for all.

Architecture

Trade-offs

Benefits: Each frontend gets an API tailored to its needs. Frontend teams can own their BFF and iterate independently. Reduces over-fetching and under-fetching.

Costs: Code duplication between BFFs (mitigate with shared libraries). More services to deploy and maintain. Risk of BFF becoming a monolith over time.

Case Study: SoundCloud's BFF

SoundCloud adopted the BFF pattern to serve their web, iOS, and Android clients. Each BFF is owned by the corresponding frontend team. This eliminated the "API committee" bottleneck where backend engineers had to negotiate a single API that satisfied all frontends.

Alternative: GraphQL

GraphQL can serve as an alternative to BFF by allowing each frontend to query exactly the data it needs. However, GraphQL introduces its own complexity (schema management, N+1 query problems, authorization at the field level).


Strangler Fig Migration Pattern

The strangler fig pattern enables incremental migration from a legacy system to a new system without a risky "big bang" rewrite.

How It Works

  1. Place a proxy (or API gateway) in front of the legacy system.
  2. All traffic flows through the proxy to the legacy system initially.
  3. Build a new service that handles one specific capability.
  4. Route traffic for that capability from the proxy to the new service instead of the legacy system.
  5. Repeat for each capability until the legacy system has no traffic.
  6. Decommission the legacy system.

Key Challenges

Data synchronization: During migration, both systems may need to read and write the same data. Strategies:

  • New service reads from the legacy database temporarily.
  • Use Change Data Capture (CDC) to keep databases in sync.
  • Accept eventual consistency during the migration period.

Feature parity verification: How do you know the new service produces the same results as the legacy system?

  • Shadow traffic: send requests to both systems, compare responses, serve the legacy response.
  • Canary routing: send a small percentage of real traffic to the new service.

Timeline: Strangler fig migrations take months to years. Maintain momentum by shipping small, incremental migrations. If the migration stalls, you end up maintaining two systems indefinitely — the worst outcome.

Case Study: Amazon's Strangler Fig

Amazon's migration from their monolithic C++ application to a service-oriented architecture used the strangler fig pattern over several years. They placed a request routing layer in front of the monolith and migrated one feature at a time. The monolith shrank over time as services absorbed its functionality.

For a detailed implementation guide, see our dedicated article on the strangler fig pattern.


Sidecar Pattern

The sidecar pattern deploys a helper process alongside the main application process. The sidecar handles cross-cutting concerns (logging, monitoring, networking, security) so the application does not have to.

How It Works

In Kubernetes, a pod can contain multiple containers. The sidecar container runs alongside the main application container, sharing the same network namespace and storage volumes.

yaml

Common Sidecar Use Cases

  1. Service mesh proxy (Envoy, linkerd2-proxy): Handles mTLS, load balancing, circuit breaking.
  2. Log collection (Fluentd, Filebeat): Collects and forwards application logs.
  3. Configuration management (Consul agent): Fetches and watches configuration changes.
  4. Secret management (Vault agent): Injects secrets into the application.
  5. Database proxy (PgBouncer, ProxySQL): Connection pooling and query routing.

Trade-offs

Benefits: Language-agnostic (works with any application), separation of concerns (infrastructure team owns the sidecar, application team owns the app), consistent behavior across all services.

Costs: Resource overhead (each sidecar consumes CPU and memory), increased pod complexity, latency overhead (traffic goes through the sidecar), debugging complexity (is the issue in the app or the sidecar?).

Sidecar vs. Library

An alternative to the sidecar pattern is a shared library that provides the same functionality. The trade-off:

  • Sidecar: Language-agnostic, independently deployable, but adds latency and resource overhead.
  • Library: No latency overhead, no extra resources, but must be implemented for each language. Updating the library requires redeploying all services.

Use sidecars when you have polyglot services. Use libraries when all services use the same language and you need minimal latency.


How to Study This Material

Architecture patterns are not useful as abstract knowledge — they become valuable when you can recognize which pattern fits a given situation and articulate why.

Phase 1: Understand the Patterns (1-2 weeks)

Read through each pattern in this guide. For each one, make sure you can:

  1. Explain the pattern in 2-3 sentences.
  2. Draw the key diagram from memory.
  3. Name one real company that uses it and why.
  4. List the top 2 trade-offs.

Phase 2: Practice Pattern Selection (1-2 weeks)

Take a system design problem (e.g., "design a ride-sharing app") and reason through which architecture patterns apply:

  • Monolith or microservices? Why?
  • Synchronous or event-driven communication? Why?
  • CQRS? Event sourcing? Why or why not?
  • Service mesh? API gateway? BFF?

Practice with system design interview questions to build this skill.

Phase 3: Apply at Work (ongoing)

The best way to learn architecture is to make architecture decisions. Look for opportunities to:

  • Write or review design documents.
  • Lead a migration initiative (even a small one).
  • Introduce a new pattern where it fits (but be conservative — don't over-architect).

As you grow from senior to staff engineer, architecture decisions become your primary output.

Recommended Reading

  1. "Building Microservices" by Sam Newman — the definitive microservices book.
  2. "Domain-Driven Design" by Eric Evans — the original DDD book (dense but essential).
  3. "Designing Data-Intensive Applications" by Martin Kleppmann — covers the data layer in depth.
  4. "A Philosophy of Software Design" by John Ousterhout — principled approach to design decisions.

Related Resources

Algoroq Guides and Concepts

System Design Practice

Technology Comparisons

  • Kafka vs RabbitMQ — choosing between messaging systems for event-driven architecture

Company Case Studies

Career Growth

Learning

GO DEEPER

Learn from senior engineers in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.