CI/CD Lesson 34 – Canary Deployment | Dataplexa
Section IV · Lesson 34

Canary Deployments

In this lesson

Canary Mechanics Traffic Splitting Metrics-Based Promotion Automated Rollback Canary vs Blue-Green

Canary deployment is a release strategy that routes a small, controlled percentage of real production traffic to a new version of an application while the majority of traffic continues to the stable version — validating the new version's behaviour with actual users before committing to a full rollout. Unlike blue-green deployment, which switches all traffic at once after pre-switch verification, canary deployment uses production traffic itself as the validation signal. If the canary — the small slice of traffic on the new version — shows elevated error rates, increased latency, or degraded business metrics, the rollout is halted and traffic returns to the stable version. If the canary is healthy, traffic gradually increases until the rollout is complete.

The Canary Mechanics — Gradual Exposure with Real Traffic

The name comes from the historical practice of sending a canary into a coal mine before miners entered — if the canary showed distress, the miners knew the air was dangerous. In deployment terms, the canary is a small subset of users who experience the new version first. Their behaviour and the application's behaviour serving them are the signal that determines whether the rollout proceeds.

A typical canary progression runs in stages: 1% of traffic, then 5%, then 25%, then 50%, then 100%. At each stage the deployment pauses, observability metrics are evaluated against defined thresholds, and the rollout either advances to the next stage or rolls back entirely. The pause duration at each stage — called the bake time — gives the monitoring system time to collect meaningful signal before the decision is made.

The Restaurant New Menu Analogy

A restaurant testing a new dish does not replace every item on the menu simultaneously and wait to see if customers complain. They introduce the new dish to a small number of tables first — observe the reaction, listen to feedback, watch for returns to the kitchen. If those tables are happy, the dish rolls out to the whole restaurant. If not, the kitchen adjusts or pulls it. Canary deployment is the same discipline applied to software: expose a small cohort to the change first, measure their experience, and let the data decide whether to continue the rollout.

Traffic Splitting — Implementation Options

Traffic splitting can be implemented at several layers of the infrastructure stack. The right choice depends on what traffic splitting granularity is needed, whether user stickiness is required — some scenarios require the same user to always hit the same version — and what infrastructure is already in place.

Traffic Splitting Implementation Options

Load balancer weights
AWS ALB weighted target groups, GCP Cloud Load Balancing, or NGINX upstream weights route a percentage of requests to the canary instances. Simple to configure, works at the request level. No user stickiness — a user may hit stable on one request and canary on the next.
Kubernetes traffic splitting
Service mesh tools — Istio, Linkerd, Argo Rollouts — split traffic between Kubernetes deployments with fine-grained control. Supports header-based routing (send all internal users to canary), user-based stickiness, and metric-based automatic promotion.
DNS-based splitting
Weighted DNS records route a percentage of all traffic to a canary endpoint. Coarse-grained — DNS TTLs mean changes take time to propagate — but useful for regional canary deployments where a whole region is the canary cohort.
Application-level routing
The application itself decides which version of a code path to execute based on user attributes — user ID hash, cohort membership, or a feature flag. Most flexible and most complex. Used when the canary must be consistent per user or per account.

Metrics-Based Promotion — Letting Data Decide

The defining feature of a mature canary deployment is metrics-based promotion: the decision to advance to the next traffic percentage — or to roll back — is made automatically based on observed metrics, not by a human watching a dashboard. The pipeline queries the observability platform after each bake period and compares the canary's metrics against the stable version's baseline. If the canary's error rate is within threshold and latency has not regressed, the pipeline advances the rollout. If any metric breaches its threshold, the pipeline triggers an automatic rollback to 0% canary traffic.

Automated Canary Pipeline — Argo Rollouts

# argo-rollout.yaml — defines the canary strategy for the API deployment
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: api
spec:
  strategy:
    canary:
      steps:
        - setWeight: 5                    # Step 1 — route 5% of traffic to canary
        - pause: {duration: 10m}          # Bake for 10 minutes
        - analysis:                       # Evaluate metrics before proceeding
            templates:
              - templateName: error-rate-check
        - setWeight: 25                   # Step 2 — advance to 25% if analysis passes
        - pause: {duration: 10m}
        - analysis:
            templates:
              - templateName: error-rate-check
        - setWeight: 100                  # Step 3 — full rollout if all checks pass
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: error-rate-check
spec:
  metrics:
    - name: error-rate
      interval: 1m
      successCondition: result[0] < 0.01  # Fail if error rate exceeds 1%
      provider:
        prometheus:
          address: http://prometheus:9090
          query: |
            sum(rate(http_requests_total{status=~"5..",version="canary"}[5m]))
            /
            sum(rate(http_requests_total{version="canary"}[5m]))

What just happened?

Argo Rollouts manages the entire canary progression automatically. It routes 5% of traffic to the new version, waits 10 minutes, then queries Prometheus to verify the canary's error rate is below 1%. If it passes, it advances to 25%, bakes again, checks again, then completes the rollout to 100%. If the error rate exceeds 1% at any stage, Argo Rollouts automatically sets the canary weight back to 0% and alerts the team — no human intervention required to stop a bad deployment.

Canary vs Blue-Green — Choosing the Right Strategy

Blue-green and canary are complementary strategies, not competing alternatives. They answer different questions. Blue-green answers: "Is this version fundamentally healthy?" — verified against a pre-production copy of the environment before any real user sees it. Canary answers: "Does this version behave correctly with real production traffic at scale?" — validated by routing actual users to it incrementally.

Canary vs Blue-Green — Decision Reference

Use canary when
Use blue-green when
You need production traffic as the validation signal — synthetic tests cannot replicate real user behaviour
You need zero production exposure before validation — the change is high-risk and no real users should see a broken version
You want automated, metrics-driven promotion with no manual steps once the rollout begins
You want a clean, instant switch with an immediate full rollback option — a load balancer change rather than a traffic weight adjustment
The change has a statistically significant user impact that only manifests at scale — a 0.5% error rate increase that affects 1000 users is meaningful but a synthetic test with 10 requests will never detect it
Your deployment includes a schema migration that must be fully applied before the application starts — canary requires both versions to be schema-compatible simultaneously
You have mature observability with well-defined SLOs that can serve as automatic promotion gates
You want to test the full deployed version in isolation before any traffic switch — e.g. a regulatory requirement for pre-production sign-off

Warning: A Canary Without Defined Metrics Thresholds Is Just a Slow Rollout

Canary deployment without pre-defined success and failure metrics is not a safety mechanism — it is a deployment process that takes longer. If the decision to advance from 5% to 25% is made by a human looking at a dashboard and deciding it "looks okay," the canary is providing false confidence. The value of canary deployment is the automated, objective comparison of the canary's metrics against the stable baseline. Without explicit thresholds — error rate below X%, p99 latency below Yms, conversion rate within Z% of baseline — the canary cannot make an automated promotion or rollback decision, and the safety guarantee disappears. Define thresholds before implementing canary, not after.

Key Takeaways from This Lesson

Canary uses production traffic as the validation signal — real users on the new version produce the most accurate signal about production behaviour, catching issues that synthetic pre-production tests cannot replicate at scale.
Bake time at each traffic stage is essential — advancing through traffic percentages too quickly defeats the purpose. The bake period gives the monitoring system time to accumulate statistically meaningful signal before the promotion decision is made.
Metrics thresholds must be defined before the canary starts — automated promotion and rollback only work if the system knows what "healthy" means. Define error rate, latency, and business metric thresholds explicitly as pre-conditions of the deployment.
Argo Rollouts automates the entire canary lifecycle — traffic progression, metric evaluation, promotion, and rollback can all be managed automatically without human intervention at each stage, making canary deployment operationally sustainable at high deployment frequency.
Canary and blue-green are complementary, not competing — blue-green verifies fundamental health before any user sees the new version; canary validates behaviour with real production traffic. Many mature organisations use both together.

Teacher's Note

Start your canary at 1%, not 10% — the cost of a bad deployment hitting 1% of users is an order of magnitude lower than 10%, and the signal from 1% is sufficient to detect most critical regressions within a reasonable bake time.

Practice Questions

Answer in your own words — then check against the expected answer.

1. What is the term for the pause period at each traffic percentage stage of a canary deployment — the window during which the monitoring system collects metrics from the canary before the automated decision to advance or roll back is made?



2. What Kubernetes-native tool manages the full canary deployment lifecycle — traffic weight progression, Prometheus metric evaluation at each stage, automatic promotion on success, and automatic rollback on metric threshold breach — without requiring pipeline intervention at each step?



3. What is the practice — central to mature canary deployment — where the decision to advance a rollout to the next traffic percentage is made automatically by comparing the canary's observed metrics against pre-defined thresholds, rather than by a human reviewing dashboards?



Lesson Quiz

1. A change introduces a subtle performance regression that only manifests under the specific traffic patterns of real production users — it passes all pre-production tests. Which deployment strategy would catch this before it affects all users, and why?


2. A team implements canary deployment by routing 5% of traffic to the new version, having an engineer watch Datadog for 10 minutes, and then manually advancing to 100% if it "looks okay." Why does this approach not deliver the safety guarantee of a true canary deployment?


3. A deployment includes a migration that drops a database column the previous application version still reads from. The team is deciding between canary and blue-green. Which strategy is more appropriate for this release, and why?


Up Next · Lesson 35

Feature Flags

Canary uses traffic splitting to control exposure. Feature flags go further — decoupling deployment from release entirely, so code ships to production before users ever see the feature.