Section IV · Lesson 39
CI/CD Anti-Patterns
In this lesson
Pipeline Anti-Patterns
Testing Anti-Patterns
Deployment Anti-Patterns
Security Anti-Patterns
Organisational Anti-Patterns
An anti-pattern is a response to a recurring problem that appears reasonable — sometimes even sophisticated — but consistently produces worse outcomes than simpler alternatives. CI/CD anti-patterns are particularly insidious because they often emerge from good intentions: a shared integration environment created to improve coverage, a manual deployment approval added to increase safety, a large release batched to reduce deployment frequency. Each decision made sense in isolation. Each one, over time, undermined the delivery system it was meant to improve. Knowing anti-patterns by name is what allows teams to recognise them early — before they become load-bearing assumptions that are expensive to remove.
Pipeline Anti-Patterns
Pipeline Anti-Patterns — Name, Symptom, and Fix
🐢
The 45-Minute Pipeline
Symptom: Developers stop waiting for CI results, merge speculatively, and treat green as a formality. Cause: Tests run serially, no caching, E2E tests on every PR push. Fix: Parallelise independent jobs, add dependency caching, move E2E tests to the merge-to-main gate only. A PR pipeline over 10 minutes is already at risk.
📋
Copy-Paste Pipeline YAML
Symptom: A security patch requires 30 pull requests across 30 repositories. Engineers dread Node.js version updates. Cause: Pipeline logic duplicated across repositories with no centralisation. Fix: Reusable workflows owned by a platform team. Every service calls the centralised workflow; improvements propagate automatically.
🎭
The Vanity Pipeline
Symptom: The pipeline always goes green, but production incidents remain frequent. Engineers trust the green build without question. Cause: Tests exist for coverage metrics, not correctness. Weak assertions, mocked dependencies, no integration tests. Fix: Audit test quality, introduce real service containers, enforce meaningful coverage thresholds with strong assertion standards.
🌊
Rebuilding at Every Stage
Symptom: Staging and production occasionally behave differently despite both being "green". Build times are multiplied across the pipeline. Cause: The artifact is rebuilt from source at the test stage, the staging deploy stage, and the production deploy stage separately. Fix: Build once, upload to the artifact registry, download the same artifact at every subsequent stage.
🔇
Silent Pipeline Failures
Symptom: A pipeline failure on the main branch sits unnoticed for hours while developers stack new commits on a broken foundation. Cause: No alerting on main branch failures; developers check results manually and inconsistently. Fix: Immediate Slack or email notification on main branch failure, with the SHA, author, failing job, and a direct link to the run.
The Broken Window Analogy
A broken window left unrepaired signals that nobody is maintaining the building — and invites further damage. A flaky test left unaddressed signals that the pipeline's failures are not to be taken seriously — and invites developers to bypass other pipeline controls. CI/CD anti-patterns have the same property: each one that is tolerated lowers the standard for what is acceptable, making the next compromise easier. A team that tolerates a 20-minute pipeline will soon tolerate a 40-minute pipeline. A team that tolerates one flaky test will soon tolerate ten. The standard must be enforced continuously, not restored occasionally.
Testing Anti-Patterns
Testing Anti-Patterns — Name, Symptom, and Fix
🎲
The Flaky Test Graveyard
Symptom: "Just re-run it" is a normal part of the development workflow. Engineers no longer treat a red build as urgent. Cause: Flaky tests tolerated rather than fixed. Each one individually seems minor; collectively they destroy pipeline trust. Fix: Zero-tolerance policy. Every flaky test is quarantined immediately and tracked as a P1 bug until resolved.
🔺
The Inverted Test Pyramid
Symptom: The CI suite takes 35 minutes and fails intermittently due to browser driver issues, network timeouts, and timing problems. Cause: The suite is dominated by E2E tests and has few unit or integration tests — the pyramid is inverted. Fix: Invest in a strong unit test base. E2E tests should cover critical paths only, not every feature.
📊
Coverage Theatre
Symptom: 95% coverage, but production bugs slip through regularly. Teams argue about coverage percentage rather than test quality. Cause: Tests exercise code paths without making meaningful assertions — coverage measures lines touched, not correctness verified. Fix: Audit assertion quality. A test that calls a function without asserting the output is noise, not signal.
🎭
Testing Against Mocks Instead of Reality
Symptom: All tests pass in CI but the application breaks on staging when it connects to a real database. Cause: Integration tests run against in-memory SQLite or hand-rolled mocks rather than real PostgreSQL. The mocks do not replicate constraint behaviour, query planner differences, or connection handling. Fix: Service containers with real database engines in CI, as covered in Lesson 27.
Deployment Anti-Patterns
Deployment Anti-Patterns — Name, Symptom, and Fix
🏔️
The Big Bang Release
Symptom: Monthly deployments with 80+ changes. Release nights are high-stress events. When something breaks, nobody knows which of the 80 changes caused it. Cause: Fear of deployment leading to infrequency, which ironically increases deployment risk further. Fix: Deploy smaller batches more often. Deployment frequency is inversely correlated with deployment risk — not positively.
🤞
Deploy and Pray
Symptom: After deploying, the team watches error dashboards nervously for 20 minutes, not sure what to look for. Incidents are discovered by users rather than monitoring. Cause: No deployment markers, no defined success metrics, no automated post-deployment checks. Fix: Smoke tests, deployment markers, and defined metric thresholds that auto-alert when breached.
🎭
Staging Theatre
Symptom: "It passed staging" is used to explain why the production incident was a surprise. The staging environment uses SQLite, mocked external services, and runs on a single container. Cause: Staging is structurally different from production — passing staging provides false confidence. Fix: Minimise staging/production differences. Use the same database engine, same infrastructure configuration, same external service sandbox accounts.
🚪
The Paper Rollback
Symptom: A production incident requires rollback. The team realises nobody has ever tested the rollback procedure. The previous artifact was cleaned up by the retention policy. The rollback takes 4 hours. Cause: Rollback exists as a document, not a rehearsed procedure. Fix: Test rollback on a schedule. Keep production-deployed artifacts for at least 30 days. A rollback that has been rehearsed is fast; one that has not is chaos.
Security and Organisational Anti-Patterns
Security and Organisational Anti-Patterns
🔑
The Permanent Master Key
Symptom: One AWS access key is used by all pipelines across all environments. When it is compromised, every service is simultaneously at risk. When it expires, every pipeline breaks at once. Cause: Long-lived static credentials shared broadly. Fix: OIDC for cloud auth, separate IAM roles per pipeline, least-privilege scoping per job.
🏢
CI/CD as an Operations Silo
Symptom: One person or one team owns "the pipeline." Developers submit deployment tickets. Changes to the pipeline require a request and a wait. Cause: CI/CD treated as infrastructure rather than a shared engineering discipline. Fix: Pipeline code lives in the application repository. Developers own their pipelines. Platform teams provide templates, not gatekeeping.
🛒
Tooling Without Culture
Symptom: The organisation buys a CI/CD platform, Kubernetes, feature flag tooling, and a deployment dashboard. Deployment frequency does not change. Fear of production persists. Cause: DevOps culture — shared ownership, psychological safety, blameless postmortems — was not adopted alongside the tools. Fix: Tools enable culture; they do not create it. Address the cultural prerequisites first.
🚫
The Emergency Bypass
Symptom: Every team has an undocumented way to skip the pipeline in an emergency. Over time, more things become emergencies. The bypass is used more than the pipeline. Cause: The pipeline is too slow or too painful for emergency response — so a bypass was created. Fix: Make the pipeline fast enough that a bypass is never justified. An emergency hotfix pipeline that takes 3 minutes and skips only E2E tests — not security scans, not smoke tests — is a legitimate fast path that does not undermine the safety model.
Warning: Anti-Patterns Are Easier to Prevent Than to Remove
Every anti-pattern on this list started as a decision that made sense given the constraints of the moment. The shared integration environment was built because someone needed cross-service testing. The bypass was created because a deployment was genuinely urgent. The duplicated YAML was copy-pasted because there was no time to build a reusable workflow. Anti-patterns become dangerous when they stop being temporary solutions and become permanent architecture. The moment a workaround is built, the team should schedule its removal. The moment a bypass is created, the team should schedule a sprint to make it unnecessary. Anti-patterns that are named and tracked are managed; anti-patterns that are accepted as "just how we do things" become load-bearing assumptions that cost ten times as much to remove later.
Key Takeaways from This Lesson
✓
Anti-patterns emerge from good intentions — a shared environment for better coverage, a bypass for faster emergencies, a long release cycle to reduce risk. Understanding the intent behind each anti-pattern is what allows teams to address the underlying need without the destructive side effect.
✓
Slow pipelines and flaky tests are the two most destructive anti-patterns — both train developers to distrust or bypass the pipeline, which undermines every other quality and safety control the pipeline provides.
✓
Staging theatre and the paper rollback are the most dangerous deployment anti-patterns — both create false confidence. A staging environment that does not resemble production and a rollback procedure that has never been tested are security blankets, not safety nets.
✓
Tools without culture produce tooling anti-patterns — buying a CI/CD platform does not produce DevOps culture. Shared ownership, blameless postmortems, and psychological safety are the prerequisites; tools are the enablers.
✓
Anti-patterns must be named, tracked, and scheduled for removal — a workaround that is not tracked will become permanent. The moment an anti-pattern is identified, create a ticket to remove it. The moment it is accepted as "just how things work," it becomes ten times more expensive to change.
Teacher's Note
Walk through this lesson's anti-pattern list with your team and mark every one you currently have — then rank them by cost and fix the top two in the next sprint. Naming a problem in a meeting is worth less than fixing one in a pull request.
Practice Questions
Answer in your own words — then check against the expected answer.
Lesson Quiz
Up Next · Lesson 40
Mini Project
The final lesson — apply everything from this course to a complete, realistic CI/CD scenario. Build the pipeline, handle the edge cases, and ship it.