CI/CD Lesson 39 – CI/CD Anti-Patterns | Dataplexa
Section IV · Lesson 39

CI/CD Anti-Patterns

In this lesson

Pipeline Anti-Patterns Testing Anti-Patterns Deployment Anti-Patterns Security Anti-Patterns Organisational Anti-Patterns

An anti-pattern is a response to a recurring problem that appears reasonable — sometimes even sophisticated — but consistently produces worse outcomes than simpler alternatives. CI/CD anti-patterns are particularly insidious because they often emerge from good intentions: a shared integration environment created to improve coverage, a manual deployment approval added to increase safety, a large release batched to reduce deployment frequency. Each decision made sense in isolation. Each one, over time, undermined the delivery system it was meant to improve. Knowing anti-patterns by name is what allows teams to recognise them early — before they become load-bearing assumptions that are expensive to remove.

Pipeline Anti-Patterns

Pipeline Anti-Patterns — Name, Symptom, and Fix

🐢
The 45-Minute Pipeline
Symptom: Developers stop waiting for CI results, merge speculatively, and treat green as a formality. Cause: Tests run serially, no caching, E2E tests on every PR push. Fix: Parallelise independent jobs, add dependency caching, move E2E tests to the merge-to-main gate only. A PR pipeline over 10 minutes is already at risk.
📋
Copy-Paste Pipeline YAML
Symptom: A security patch requires 30 pull requests across 30 repositories. Engineers dread Node.js version updates. Cause: Pipeline logic duplicated across repositories with no centralisation. Fix: Reusable workflows owned by a platform team. Every service calls the centralised workflow; improvements propagate automatically.
🎭
The Vanity Pipeline
Symptom: The pipeline always goes green, but production incidents remain frequent. Engineers trust the green build without question. Cause: Tests exist for coverage metrics, not correctness. Weak assertions, mocked dependencies, no integration tests. Fix: Audit test quality, introduce real service containers, enforce meaningful coverage thresholds with strong assertion standards.
🌊
Rebuilding at Every Stage
Symptom: Staging and production occasionally behave differently despite both being "green". Build times are multiplied across the pipeline. Cause: The artifact is rebuilt from source at the test stage, the staging deploy stage, and the production deploy stage separately. Fix: Build once, upload to the artifact registry, download the same artifact at every subsequent stage.
🔇
Silent Pipeline Failures
Symptom: A pipeline failure on the main branch sits unnoticed for hours while developers stack new commits on a broken foundation. Cause: No alerting on main branch failures; developers check results manually and inconsistently. Fix: Immediate Slack or email notification on main branch failure, with the SHA, author, failing job, and a direct link to the run.

The Broken Window Analogy

A broken window left unrepaired signals that nobody is maintaining the building — and invites further damage. A flaky test left unaddressed signals that the pipeline's failures are not to be taken seriously — and invites developers to bypass other pipeline controls. CI/CD anti-patterns have the same property: each one that is tolerated lowers the standard for what is acceptable, making the next compromise easier. A team that tolerates a 20-minute pipeline will soon tolerate a 40-minute pipeline. A team that tolerates one flaky test will soon tolerate ten. The standard must be enforced continuously, not restored occasionally.

Testing Anti-Patterns

Testing Anti-Patterns — Name, Symptom, and Fix

🎲
The Flaky Test Graveyard
Symptom: "Just re-run it" is a normal part of the development workflow. Engineers no longer treat a red build as urgent. Cause: Flaky tests tolerated rather than fixed. Each one individually seems minor; collectively they destroy pipeline trust. Fix: Zero-tolerance policy. Every flaky test is quarantined immediately and tracked as a P1 bug until resolved.
🔺
The Inverted Test Pyramid
Symptom: The CI suite takes 35 minutes and fails intermittently due to browser driver issues, network timeouts, and timing problems. Cause: The suite is dominated by E2E tests and has few unit or integration tests — the pyramid is inverted. Fix: Invest in a strong unit test base. E2E tests should cover critical paths only, not every feature.
📊
Coverage Theatre
Symptom: 95% coverage, but production bugs slip through regularly. Teams argue about coverage percentage rather than test quality. Cause: Tests exercise code paths without making meaningful assertions — coverage measures lines touched, not correctness verified. Fix: Audit assertion quality. A test that calls a function without asserting the output is noise, not signal.
🎭
Testing Against Mocks Instead of Reality
Symptom: All tests pass in CI but the application breaks on staging when it connects to a real database. Cause: Integration tests run against in-memory SQLite or hand-rolled mocks rather than real PostgreSQL. The mocks do not replicate constraint behaviour, query planner differences, or connection handling. Fix: Service containers with real database engines in CI, as covered in Lesson 27.

Deployment Anti-Patterns

Deployment Anti-Patterns — Name, Symptom, and Fix

🏔️
The Big Bang Release
Symptom: Monthly deployments with 80+ changes. Release nights are high-stress events. When something breaks, nobody knows which of the 80 changes caused it. Cause: Fear of deployment leading to infrequency, which ironically increases deployment risk further. Fix: Deploy smaller batches more often. Deployment frequency is inversely correlated with deployment risk — not positively.
🤞
Deploy and Pray
Symptom: After deploying, the team watches error dashboards nervously for 20 minutes, not sure what to look for. Incidents are discovered by users rather than monitoring. Cause: No deployment markers, no defined success metrics, no automated post-deployment checks. Fix: Smoke tests, deployment markers, and defined metric thresholds that auto-alert when breached.
🎭
Staging Theatre
Symptom: "It passed staging" is used to explain why the production incident was a surprise. The staging environment uses SQLite, mocked external services, and runs on a single container. Cause: Staging is structurally different from production — passing staging provides false confidence. Fix: Minimise staging/production differences. Use the same database engine, same infrastructure configuration, same external service sandbox accounts.
🚪
The Paper Rollback
Symptom: A production incident requires rollback. The team realises nobody has ever tested the rollback procedure. The previous artifact was cleaned up by the retention policy. The rollback takes 4 hours. Cause: Rollback exists as a document, not a rehearsed procedure. Fix: Test rollback on a schedule. Keep production-deployed artifacts for at least 30 days. A rollback that has been rehearsed is fast; one that has not is chaos.

Security and Organisational Anti-Patterns

Security and Organisational Anti-Patterns

🔑
The Permanent Master Key
Symptom: One AWS access key is used by all pipelines across all environments. When it is compromised, every service is simultaneously at risk. When it expires, every pipeline breaks at once. Cause: Long-lived static credentials shared broadly. Fix: OIDC for cloud auth, separate IAM roles per pipeline, least-privilege scoping per job.
🏢
CI/CD as an Operations Silo
Symptom: One person or one team owns "the pipeline." Developers submit deployment tickets. Changes to the pipeline require a request and a wait. Cause: CI/CD treated as infrastructure rather than a shared engineering discipline. Fix: Pipeline code lives in the application repository. Developers own their pipelines. Platform teams provide templates, not gatekeeping.
🛒
Tooling Without Culture
Symptom: The organisation buys a CI/CD platform, Kubernetes, feature flag tooling, and a deployment dashboard. Deployment frequency does not change. Fear of production persists. Cause: DevOps culture — shared ownership, psychological safety, blameless postmortems — was not adopted alongside the tools. Fix: Tools enable culture; they do not create it. Address the cultural prerequisites first.
🚫
The Emergency Bypass
Symptom: Every team has an undocumented way to skip the pipeline in an emergency. Over time, more things become emergencies. The bypass is used more than the pipeline. Cause: The pipeline is too slow or too painful for emergency response — so a bypass was created. Fix: Make the pipeline fast enough that a bypass is never justified. An emergency hotfix pipeline that takes 3 minutes and skips only E2E tests — not security scans, not smoke tests — is a legitimate fast path that does not undermine the safety model.

Warning: Anti-Patterns Are Easier to Prevent Than to Remove

Every anti-pattern on this list started as a decision that made sense given the constraints of the moment. The shared integration environment was built because someone needed cross-service testing. The bypass was created because a deployment was genuinely urgent. The duplicated YAML was copy-pasted because there was no time to build a reusable workflow. Anti-patterns become dangerous when they stop being temporary solutions and become permanent architecture. The moment a workaround is built, the team should schedule its removal. The moment a bypass is created, the team should schedule a sprint to make it unnecessary. Anti-patterns that are named and tracked are managed; anti-patterns that are accepted as "just how we do things" become load-bearing assumptions that cost ten times as much to remove later.

Key Takeaways from This Lesson

Anti-patterns emerge from good intentions — a shared environment for better coverage, a bypass for faster emergencies, a long release cycle to reduce risk. Understanding the intent behind each anti-pattern is what allows teams to address the underlying need without the destructive side effect.
Slow pipelines and flaky tests are the two most destructive anti-patterns — both train developers to distrust or bypass the pipeline, which undermines every other quality and safety control the pipeline provides.
Staging theatre and the paper rollback are the most dangerous deployment anti-patterns — both create false confidence. A staging environment that does not resemble production and a rollback procedure that has never been tested are security blankets, not safety nets.
Tools without culture produce tooling anti-patterns — buying a CI/CD platform does not produce DevOps culture. Shared ownership, blameless postmortems, and psychological safety are the prerequisites; tools are the enablers.
Anti-patterns must be named, tracked, and scheduled for removal — a workaround that is not tracked will become permanent. The moment an anti-pattern is identified, create a ticket to remove it. The moment it is accepted as "just how things work," it becomes ten times more expensive to change.

Teacher's Note

Walk through this lesson's anti-pattern list with your team and mark every one you currently have — then rank them by cost and fix the top two in the next sprint. Naming a problem in a meeting is worth less than fixing one in a pull request.

Practice Questions

Answer in your own words — then check against the expected answer.

1. What is the name of the anti-pattern where a staging environment is structurally different from production — using a different database engine, mocked external services, or single-container infrastructure — giving the team false confidence that passing staging means the release is safe for production?



2. What testing anti-pattern describes a test suite dominated by slow, brittle end-to-end tests with very few unit or integration tests — producing a pipeline that is slow, flaky, and provides poor signal about where failures originate?



3. What deployment anti-pattern accumulates weeks or months of changes into a single large release — where the deployment risk grows with each added change, rollback identification becomes nearly impossible, and release events are high-stress ceremonies rather than routine operations?



Lesson Quiz

1. A team has 94% test coverage and a nearly always-green CI pipeline, but ships production bugs at the same rate as before they introduced testing. A consultant reviews the tests and finds they call functions without asserting return values. Which anti-pattern is this?


2. A development team must submit a deployment ticket to the platform team and wait up to 48 hours for their changes to reach production. The pipeline is owned and maintained exclusively by the platform team. Which anti-pattern does this describe?


3. A team creates an undocumented emergency deployment bypass that skips all pipeline stages. Initially used once a month, it is now used multiple times per week because "everything is urgent." What long-term consequence does this illustrate?


Up Next · Lesson 40

Mini Project

The final lesson — apply everything from this course to a complete, realistic CI/CD scenario. Build the pipeline, handle the edge cases, and ship it.