CI/CD Course
Feature Flags
In this lesson
Feature flags (also called feature toggles or feature switches) are conditional code paths that allow specific features, behaviours, or experiments to be enabled or disabled at runtime — without deploying new code. A feature flag is a boolean condition the application evaluates: if the flag is on, the new code path executes; if it is off, the old code path executes. This runtime control decouples two operations that are often conflated — deployment (shipping code to production) and release (making a feature visible to users). Code that is deployed but flag-disabled is in production, has been tested end-to-end in the production environment, and can be released to users at any time by flipping a switch — with no new deployment required.
Deployment vs Release — The Core Distinction
Most teams treat deployment and release as the same event. A change is merged, the pipeline runs, the artifact reaches production, and the feature is live. Feature flags break this coupling deliberately. Deployment becomes a technical operation — getting tested code into production safely — and release becomes a product decision — determining when and to whom a feature is visible. These two decisions have different owners, different timelines, and different risk profiles.
Decoupling them produces several immediate benefits for CI/CD. Trunk-based development becomes practical — as discussed in Lesson 11, developers commit unfinished features directly to main behind a disabled flag rather than maintaining long-lived feature branches. Release risk drops — turning a flag on for 1% of users, monitoring for 24 hours, and then enabling it for everyone is far safer than deploying to 100% of users with no incremental validation. Product teams can control launch timing — an engineering team can ship the code weeks before a marketing launch date, with the flag enabling the feature at the exact moment the campaign goes live.
The Dimmer Switch Analogy
A standard light switch has two states: fully on or fully off. A dimmer switch has a continuous spectrum — 0%, 10%, 50%, 100% — and can be adjusted at any time without rewiring the circuit. Feature flags are the dimmer switch for software features. A simple boolean flag is the on/off switch — useful but limited. A percentage rollout flag is the dimmer — gradually increasing exposure while monitoring the effect. A user-segment flag is a programmable dimmer — different brightness for different rooms. The wiring (the deployed code) is the same in every case; only the control state changes.
Flag Types — Four Categories With Different Lifespans
Not all feature flags are created equal. Using the wrong flag type for a use case — or treating all flags as permanent configuration — leads to the flag debt problem discussed later in this lesson. Understanding the four categories helps teams make deliberate decisions about flag lifespan and ownership from the moment a flag is created.
Feature Flag Categories — Type, Lifespan, and Purpose
CI/CD Integration — Flags in the Pipeline and in Tests
Feature flags interact with CI/CD pipelines in two important ways: how flags are evaluated at runtime, and how tests handle code that sits behind a flag. Both require deliberate design to avoid the flag becoming a source of test blind spots or pipeline complexity.
Runtime evaluation means the flag state is determined at request time — not at build time or deploy time — by querying a flag service or reading a configuration value. Tools like LaunchDarkly, Unleash, and Flagsmith provide SDK-based flag evaluation that integrates with any language. The flag state can be changed instantly without redeploying, and different users can see different flag states simultaneously. This is what makes gradual rollout and A/B testing possible.
For testing, code behind a flag must be tested in both the flag-on and flag-off state. A CI pipeline that only tests the flag-off code path ships untested code to production every time the flag is enabled. The convention is to write tests that explicitly set the flag state for the duration of the test — using the flag SDK's test context or a test double — ensuring both paths are covered. Integration and end-to-end tests should run in a flag-on configuration to verify the full feature before the flag is enabled in production.
Feature Flag Evaluation and Testing Pattern
# Application code — flag evaluated at request time
const newCheckoutEnabled = await flagClient.variation(
'new-checkout-flow', # Flag key
user, # User context — enables per-user targeting
false # Default value if flag service is unavailable
);
if (newCheckoutEnabled) {
return renderNewCheckout(cart);
} else {
return renderLegacyCheckout(cart);
}
# Test coverage — both flag states must be tested in CI
describe('checkout flow', () => {
describe('with new-checkout-flow flag OFF', () => {
beforeEach(() => flagClient.setFlagForTest('new-checkout-flow', false));
it('renders the legacy checkout', () => { /* ... */ });
});
describe('with new-checkout-flow flag ON', () => {
beforeEach(() => flagClient.setFlagForTest('new-checkout-flow', true));
it('renders the new checkout', () => { /* ... */ });
it('processes payment correctly', () => { /* ... */ });
it('sends the confirmation email', () => { /* ... */ });
});
});
What just happened?
The application evaluates the flag at runtime against the user context — different users can get different flag states simultaneously. The test suite explicitly sets the flag state in both configurations and asserts the correct behaviour in each. The pipeline runs all tests in both states on every PR, ensuring that enabling the flag in production activates code that has been tested in CI rather than shipped untested behind a disabled switch.
Flag Debt — The Long-Term Risk of Flags Without Discipline
Flag debt is the accumulated burden of feature flags that were never removed after their purpose was served. A codebase with fifty stale flags — each wrapping a conditional branch that has been always-on for eighteen months — is harder to understand, harder to test, and slower to build than one without them. The conditional branches remain, the tests cover both paths for code that will never be off, and new developers must understand which flags are active before they can reason about any code path that touches a flagged feature.
The practices that prevent flag debt are simple but require discipline: every flag has an owner and a removal date set at creation, flags appear in the team's backlog as technical debt items the moment they are fully enabled, and a periodic flag audit removes stale flags before they accumulate. Most feature flag platforms support flag expiry alerts — notifying the owner when a flag has been fully enabled for more than N days with no scheduled removal. This is the equivalent of a dependency vulnerability scanner, applied to flags.
Warning: Code Behind a Flag Is Not Tested Code Until Both Flag States Are Tested
The most dangerous misconception about feature flags is that deploying code behind a disabled flag is the same as deploying tested code. It is not. If the CI pipeline only tests the flag-off code path — because the flag is disabled in the test environment by default — then enabling the flag in production activates untested code. The first time that code path executes in production is the first time it has ever run against real data, real dependencies, and real edge cases. This is not a deployment safety mechanism; it is deferred testing with production users as the test suite. Every pipeline run must test both flag states for every flagged code path, and the flag must be enabled in the integration test suite before it is enabled in production.
Key Takeaways from This Lesson
Teacher's Note
When you create a feature flag, immediately open a second ticket titled "Remove [flag name]" and put it in the backlog — the act of creating the removal ticket forces the conversation about when the flag should be cleaned up before anyone forgets it exists.
Practice Questions
Answer in your own words — then check against the expected answer.
1. What is the term for the accumulated burden of feature flags that were never removed after their purpose was served — stale conditional branches that make the codebase harder to understand, harder to test, and slower to build over time?
2. What flag type randomly assigns user cohorts to a flag-on or flag-off state, compares metrics across cohorts to determine which implementation performs better, and is removed once the winning implementation is chosen — owned by the product team and scoped to the duration of the test?
3. What is the most common flag type — disabled during development and deployment, enabled incrementally for users during rollout, and removed once the feature is fully live and stable — which should have a defined removal date set at the moment of creation?
Lesson Quiz
1. A marketing campaign launches on December 1st. The engineering team finishes the feature code on November 15th. How do feature flags allow the team to deploy safely on November 15th while ensuring users see the feature exactly on December 1st?
2. A team deploys a new checkout flow behind a disabled flag. Their CI pipeline only runs tests in the flag-off configuration. The flag is enabled in production and immediately causes errors. What testing failure caused this?
3. A SaaS product uses a feature flag to give enterprise-tier users access to an advanced analytics dashboard that free-tier users cannot see. The flag has been in production for two years with no planned removal. What flag type is this, and what does that classification imply about how it should be managed?
Up Next · Lesson 36
Infrastructure as Code in CI/CD
The pipeline deploys the application. Infrastructure as Code ensures the environment it deploys into is version-controlled, reproducible, and managed with the same discipline as the application code itself.