CI/CD Lesson 35 – Feature Flags | Dataplexa
Section IV · Lesson 35

Feature Flags

In this lesson

Deployment vs Release Flag Types Flag Management CI/CD Integration Flag Debt

Feature flags (also called feature toggles or feature switches) are conditional code paths that allow specific features, behaviours, or experiments to be enabled or disabled at runtime — without deploying new code. A feature flag is a boolean condition the application evaluates: if the flag is on, the new code path executes; if it is off, the old code path executes. This runtime control decouples two operations that are often conflated — deployment (shipping code to production) and release (making a feature visible to users). Code that is deployed but flag-disabled is in production, has been tested end-to-end in the production environment, and can be released to users at any time by flipping a switch — with no new deployment required.

Deployment vs Release — The Core Distinction

Most teams treat deployment and release as the same event. A change is merged, the pipeline runs, the artifact reaches production, and the feature is live. Feature flags break this coupling deliberately. Deployment becomes a technical operation — getting tested code into production safely — and release becomes a product decision — determining when and to whom a feature is visible. These two decisions have different owners, different timelines, and different risk profiles.

Decoupling them produces several immediate benefits for CI/CD. Trunk-based development becomes practical — as discussed in Lesson 11, developers commit unfinished features directly to main behind a disabled flag rather than maintaining long-lived feature branches. Release risk drops — turning a flag on for 1% of users, monitoring for 24 hours, and then enabling it for everyone is far safer than deploying to 100% of users with no incremental validation. Product teams can control launch timing — an engineering team can ship the code weeks before a marketing launch date, with the flag enabling the feature at the exact moment the campaign goes live.

The Dimmer Switch Analogy

A standard light switch has two states: fully on or fully off. A dimmer switch has a continuous spectrum — 0%, 10%, 50%, 100% — and can be adjusted at any time without rewiring the circuit. Feature flags are the dimmer switch for software features. A simple boolean flag is the on/off switch — useful but limited. A percentage rollout flag is the dimmer — gradually increasing exposure while monitoring the effect. A user-segment flag is a programmable dimmer — different brightness for different rooms. The wiring (the deployed code) is the same in every case; only the control state changes.

Flag Types — Four Categories With Different Lifespans

Not all feature flags are created equal. Using the wrong flag type for a use case — or treating all flags as permanent configuration — leads to the flag debt problem discussed later in this lesson. Understanding the four categories helps teams make deliberate decisions about flag lifespan and ownership from the moment a flag is created.

Feature Flag Categories — Type, Lifespan, and Purpose

🚀
Release flags — short-lived
Control the rollout of a specific feature. Disabled during development and deployment, enabled incrementally for users, then removed once the feature is fully live and stable. Lifespan: days to weeks. These are the most common flag type and should have a defined removal date from the moment they are created.
🧪
Experiment flags — short-lived
A/B test two different implementations. User cohorts are randomly assigned to flag-on or flag-off. Metrics are compared across cohorts. The winning implementation is kept; the flag is removed. Lifespan: the duration of the experiment. Owned by the product team, not engineering.
🔧
Ops flags — medium-lived
Control operational behaviour — rate limiting levels, cache TTL strategies, circuit breaker thresholds. Enabled or disabled in response to production conditions without requiring a deployment. Lifespan: months. Should be reviewed periodically and documented with their operational purpose.
⚙️
Permission flags — long-lived
Gate premium features by user plan, role, or account. Enterprise users get the flag enabled; free users do not. Lifespan: indefinite — these are effectively configuration, not temporary flags. Should be managed through a proper entitlement system rather than a release flag tool.

CI/CD Integration — Flags in the Pipeline and in Tests

Feature flags interact with CI/CD pipelines in two important ways: how flags are evaluated at runtime, and how tests handle code that sits behind a flag. Both require deliberate design to avoid the flag becoming a source of test blind spots or pipeline complexity.

Runtime evaluation means the flag state is determined at request time — not at build time or deploy time — by querying a flag service or reading a configuration value. Tools like LaunchDarkly, Unleash, and Flagsmith provide SDK-based flag evaluation that integrates with any language. The flag state can be changed instantly without redeploying, and different users can see different flag states simultaneously. This is what makes gradual rollout and A/B testing possible.

For testing, code behind a flag must be tested in both the flag-on and flag-off state. A CI pipeline that only tests the flag-off code path ships untested code to production every time the flag is enabled. The convention is to write tests that explicitly set the flag state for the duration of the test — using the flag SDK's test context or a test double — ensuring both paths are covered. Integration and end-to-end tests should run in a flag-on configuration to verify the full feature before the flag is enabled in production.

Feature Flag Evaluation and Testing Pattern

# Application code — flag evaluated at request time
const newCheckoutEnabled = await flagClient.variation(
  'new-checkout-flow',     # Flag key
  user,                    # User context — enables per-user targeting
  false                    # Default value if flag service is unavailable
);

if (newCheckoutEnabled) {
  return renderNewCheckout(cart);
} else {
  return renderLegacyCheckout(cart);
}
# Test coverage — both flag states must be tested in CI
describe('checkout flow', () => {
  describe('with new-checkout-flow flag OFF', () => {
    beforeEach(() => flagClient.setFlagForTest('new-checkout-flow', false));
    it('renders the legacy checkout', () => { /* ... */ });
  });

  describe('with new-checkout-flow flag ON', () => {
    beforeEach(() => flagClient.setFlagForTest('new-checkout-flow', true));
    it('renders the new checkout', () => { /* ... */ });
    it('processes payment correctly', () => { /* ... */ });
    it('sends the confirmation email', () => { /* ... */ });
  });
});

What just happened?

The application evaluates the flag at runtime against the user context — different users can get different flag states simultaneously. The test suite explicitly sets the flag state in both configurations and asserts the correct behaviour in each. The pipeline runs all tests in both states on every PR, ensuring that enabling the flag in production activates code that has been tested in CI rather than shipped untested behind a disabled switch.

Flag Debt — The Long-Term Risk of Flags Without Discipline

Flag debt is the accumulated burden of feature flags that were never removed after their purpose was served. A codebase with fifty stale flags — each wrapping a conditional branch that has been always-on for eighteen months — is harder to understand, harder to test, and slower to build than one without them. The conditional branches remain, the tests cover both paths for code that will never be off, and new developers must understand which flags are active before they can reason about any code path that touches a flagged feature.

The practices that prevent flag debt are simple but require discipline: every flag has an owner and a removal date set at creation, flags appear in the team's backlog as technical debt items the moment they are fully enabled, and a periodic flag audit removes stale flags before they accumulate. Most feature flag platforms support flag expiry alerts — notifying the owner when a flag has been fully enabled for more than N days with no scheduled removal. This is the equivalent of a dependency vulnerability scanner, applied to flags.

Warning: Code Behind a Flag Is Not Tested Code Until Both Flag States Are Tested

The most dangerous misconception about feature flags is that deploying code behind a disabled flag is the same as deploying tested code. It is not. If the CI pipeline only tests the flag-off code path — because the flag is disabled in the test environment by default — then enabling the flag in production activates untested code. The first time that code path executes in production is the first time it has ever run against real data, real dependencies, and real edge cases. This is not a deployment safety mechanism; it is deferred testing with production users as the test suite. Every pipeline run must test both flag states for every flagged code path, and the flag must be enabled in the integration test suite before it is enabled in production.

Key Takeaways from This Lesson

Feature flags decouple deployment from release — code ships to production on the engineering team's timeline; the feature becomes visible to users on the product team's timeline. These are different decisions with different owners.
Flags enable trunk-based development — unfinished features are committed to main behind a disabled flag rather than living on long-lived feature branches, keeping integration risk low and feedback loops tight.
Both flag states must be tested in CI — a pipeline that only tests flag-off code ships untested code to production when the flag is enabled. Test both paths explicitly on every PR, and enable the flag in integration tests before enabling it in production.
Flags have four types with different lifespans — release flags (days to weeks), experiment flags (experiment duration), ops flags (months), and permission flags (indefinite). Treating all flags as permanent configuration is the root cause of flag debt.
Flag debt must be actively managed — every flag needs an owner, a purpose, and a removal date. Stale flags that were never cleaned up after a feature launched are technical debt that accumulates silently and makes the codebase progressively harder to reason about.

Teacher's Note

When you create a feature flag, immediately open a second ticket titled "Remove [flag name]" and put it in the backlog — the act of creating the removal ticket forces the conversation about when the flag should be cleaned up before anyone forgets it exists.

Practice Questions

Answer in your own words — then check against the expected answer.

1. What is the term for the accumulated burden of feature flags that were never removed after their purpose was served — stale conditional branches that make the codebase harder to understand, harder to test, and slower to build over time?



2. What flag type randomly assigns user cohorts to a flag-on or flag-off state, compares metrics across cohorts to determine which implementation performs better, and is removed once the winning implementation is chosen — owned by the product team and scoped to the duration of the test?



3. What is the most common flag type — disabled during development and deployment, enabled incrementally for users during rollout, and removed once the feature is fully live and stable — which should have a defined removal date set at the moment of creation?



Lesson Quiz

1. A marketing campaign launches on December 1st. The engineering team finishes the feature code on November 15th. How do feature flags allow the team to deploy safely on November 15th while ensuring users see the feature exactly on December 1st?


2. A team deploys a new checkout flow behind a disabled flag. Their CI pipeline only runs tests in the flag-off configuration. The flag is enabled in production and immediately causes errors. What testing failure caused this?


3. A SaaS product uses a feature flag to give enterprise-tier users access to an advanced analytics dashboard that free-tier users cannot see. The flag has been in production for two years with no planned removal. What flag type is this, and what does that classification imply about how it should be managed?


Up Next · Lesson 36

Infrastructure as Code in CI/CD

The pipeline deploys the application. Infrastructure as Code ensures the environment it deploys into is version-controlled, reproducible, and managed with the same discipline as the application code itself.