Zero-Downtime Deployments: Blue-Green, Canary, and Rolling Strategies
Deployment strategies for zero downtime with Kubernetes examples, database migration patterns, feature flags, and rollback procedures.
Akhil Sharma
March 9, 2026
Zero-Downtime Deployments: Blue-Green, Canary, and Rolling Strategies
Deploying without downtime isn't just about the deployment strategy — it's about how your application handles the transition between versions. Database migrations, in-flight requests, and connection draining all need consideration. The deployment strategy is the easy part.
Rolling Deployments
The default in Kubernetes. Old pods are gradually replaced with new pods. At any point during the deployment, both old and new versions serve traffic.
Key settings:
- maxUnavailable: 0 ensures capacity never drops below the desired replica count
- readinessProbe prevents traffic from hitting pods that aren't ready
- preStop hook gives in-flight requests time to complete before the pod shuts down
Risk: Both versions serve traffic simultaneously during the rollout. If v2 has a breaking change, some users see v2 while others see v1. This requires backward-compatible changes.
Blue-Green Deployments
Run two identical environments. "Blue" is the current production environment. "Green" is the new version. Once green passes health checks, switch all traffic from blue to green.
Implementation with Kubernetes Services:
Switch by updating the Service selector from version: blue to version: green. Rollback by switching back.
Advantage: Instant rollback. The old version is still running — just switch the selector back.
Disadvantage: Requires double the infrastructure during deployment. With 4 replicas, you need 8 running during the transition.
Canary Deployments
Route a small percentage of traffic to the new version. Monitor error rates and latency. Gradually increase traffic if metrics are healthy. Roll back if they degrade.
Advanced System Design Cohort
We build this end-to-end in the cohort.
Live sessions, real systems, your questions answered in real time. Next cohort starts 2nd July 2026 — 20 seats.
Reserve your spot →Using Argo Rollouts for automated canary:
This automatically promotes the canary if the success rate stays above 99%, and rolls back if it drops below.
Database Migrations Without Downtime
The deployment strategy is the easy part. Database migrations are where zero-downtime deployments actually break.
The problem: During a rolling deployment, both v1 and v2 code run simultaneously. If v2 requires a schema change, v1 code might break against the new schema.
The solution: Expand-Contract pattern.
Phase 1: Expand (backward-compatible)
Add new columns/tables without removing or renaming existing ones. Both v1 and v2 work with the expanded schema.
Phase 2: Migrate Data
Backfill the new column with data from existing columns:
Phase 3: Contract (remove old)
After all instances are running v2 and the new column is populated, remove the old column in a future deployment:
Rules:
- Never rename a column in a single deployment. Add the new column (expand), deploy, migrate data, drop the old column (contract) in the next deployment.
- Never add a NOT NULL column without a DEFAULT in a single step. Add it as nullable first, backfill, then add the constraint.
- Never drop a column that running code still reads.
Graceful Shutdown
When a pod is terminated, in-flight requests must complete before the process exits.
The Kubernetes pod lifecycle for graceful shutdown:
The sleep 10 in the preStop hook is critical. There's a race between the pod being removed from endpoints and the load balancer updating its target list. Without the sleep, the load balancer might still send requests to a pod that's already shutting down.
Feature Flags for Deployment Safety
Decouple deployment from release. Deploy v2 code to production but keep new features behind flags. Enable features gradually after deployment is verified.
This lets you:
- Deploy code changes without user-facing impact
- Enable features for specific users (internal team, beta users)
- Instantly disable a broken feature without a rollback deployment
- Run A/B tests on the same deployment
Feature flags + canary deployments is the safest combination. The canary validates infrastructure stability (new code doesn't crash), and feature flags control feature exposure independently.
Rollback Checklist
When a deployment goes wrong:
- Automated rollback — Argo Rollouts or Flagger detect metric degradation and roll back automatically
- Manual rollback —
kubectl rollout undo deployment/order-service(Kubernetes keeps previous ReplicaSets) - Verify rollback — Check that error rates return to baseline after rollback
- Database state — If a migration ran, is the old code compatible with the new schema? (This is why expand-contract matters)
- Post-mortem — Why did the canary not catch the issue? Was the analysis template checking the right metrics?
Zero-downtime deployments are a system property, not a deployment strategy choice. The strategy (rolling, blue-green, canary) is one piece. Backward-compatible database migrations, graceful shutdown, readiness probes, and feature flags are equally important. Get all of them right, and deployments become routine. Miss any one, and you're one bad deploy away from an outage.
More in Architecture
The Strangler Fig Pattern: Migrating Legacy Systems Incrementally
Implementing the strangler fig pattern for legacy migration with request routing, data synchronization, feature parity verification, and a realistic migration timeline.
Designing Data Pipeline Architecture for Real-Time Analytics
Real-time data pipeline design covering Lambda vs Kappa architecture, stream processing with Kafka Streams and Flink, and handling late-arriving data.