CI/CD Lesson 39 – CI/CD Anti-Patterns | Dataplexa

CI/CD Course

I. CI/CD Fundamentals

1. What is CI/CD 2. Software Development Lifecycle 3. Problems with Traditional Deployment 4. Continuous Integration 5. Continuous Delivery 6. Continuous Deployment 7. CI vs CD 8. CI/CD Pipeline Overview 9. Benefits of CI/CD 10. CI/CD in Modern DevOps

II. CI/CD Pipeline Concepts

11. Source Code Management 12. Build Automation 13. Dependency Management 14. Artifact Management 15. Automated Testing 16. Test Types in Pipelines 17. Code Quality & Static Analysis 18. CI/CD Pipeline Stages 19. Environment Promotion 20. Rollback Strategies

III. CI/CD Architecture & Security

21. CI/CD Architecture Design 22. Pipeline as Code 23. Secrets Management 24. Environment Variables & Configs 25. Secure CI/CD Pipelines 26. Access Control & Permissions 27. CI/CD with Containers 28. CI/CD with Docker 29. CI/CD with Kubernetes 30. Monitoring CI/CD Pipelines

IV. Enterprise CI/CD & Real-World

31. CI/CD for Microservices 32. CI/CD for Monolithic Applications 33. Blue-Green Deployments 34. Canary Deployments 35. Feature Flags 36. Infrastructure as Code in CI/CD 37. CI/CD in Cloud Environments 38. CI/CD Best Practices 39. CI/CD Anti-Patterns 40. Mini Project

Section IV · Lesson 39

CI/CD Anti-Patterns

In this lesson

Pipeline Anti-Patterns Testing Anti-Patterns Deployment Anti-Patterns Security Anti-Patterns Organisational Anti-Patterns

An anti-pattern is a response to a recurring problem that appears reasonable — sometimes even sophisticated — but consistently produces worse outcomes than simpler alternatives. CI/CD anti-patterns are particularly insidious because they often emerge from good intentions: a shared integration environment created to improve coverage, a manual deployment approval added to increase safety, a large release batched to reduce deployment frequency. Each decision made sense in isolation. Each one, over time, undermined the delivery system it was meant to improve. Knowing anti-patterns by name is what allows teams to recognise them early — before they become load-bearing assumptions that are expensive to remove.

Pipeline Anti-Patterns

Pipeline Anti-Patterns — Name, Symptom, and Fix

🐢

The 45-Minute Pipeline

Symptom: Developers stop waiting for CI results, merge speculatively, and treat green as a formality. Cause: Tests run serially, no caching, E2E tests on every PR push. Fix: Parallelise independent jobs, add dependency caching, move E2E tests to the merge-to-main gate only. A PR pipeline over 10 minutes is already at risk.

📋

Copy-Paste Pipeline YAML

Symptom: A security patch requires 30 pull requests across 30 repositories. Engineers dread Node.js version updates. Cause: Pipeline logic duplicated across repositories with no centralisation. Fix: Reusable workflows owned by a platform team. Every service calls the centralised workflow; improvements propagate automatically.

🎭

The Vanity Pipeline

Symptom: The pipeline always goes green, but production incidents remain frequent. Engineers trust the green build without question. Cause: Tests exist for coverage metrics, not correctness. Weak assertions, mocked dependencies, no integration tests. Fix: Audit test quality, introduce real service containers, enforce meaningful coverage thresholds with strong assertion standards.

🌊

Rebuilding at Every Stage

Symptom: Staging and production occasionally behave differently despite both being "green". Build times are multiplied across the pipeline. Cause: The artifact is rebuilt from source at the test stage, the staging deploy stage, and the production deploy stage separately. Fix: Build once, upload to the artifact registry, download the same artifact at every subsequent stage.

🔇

Silent Pipeline Failures

Symptom: A pipeline failure on the main branch sits unnoticed for hours while developers stack new commits on a broken foundation. Cause: No alerting on main branch failures; developers check results manually and inconsistently. Fix: Immediate Slack or email notification on main branch failure, with the SHA, author, failing job, and a direct link to the run.

The Broken Window Analogy

A broken window left unrepaired signals that nobody is maintaining the building — and invites further damage. A flaky test left unaddressed signals that the pipeline's failures are not to be taken seriously — and invites developers to bypass other pipeline controls. CI/CD anti-patterns have the same property: each one that is tolerated lowers the standard for what is acceptable, making the next compromise easier. A team that tolerates a 20-minute pipeline will soon tolerate a 40-minute pipeline. A team that tolerates one flaky test will soon tolerate ten. The standard must be enforced continuously, not restored occasionally.

Testing Anti-Patterns

Testing Anti-Patterns — Name, Symptom, and Fix

🎲

The Flaky Test Graveyard

Symptom: "Just re-run it" is a normal part of the development workflow. Engineers no longer treat a red build as urgent. Cause: Flaky tests tolerated rather than fixed. Each one individually seems minor; collectively they destroy pipeline trust. Fix: Zero-tolerance policy. Every flaky test is quarantined immediately and tracked as a P1 bug until resolved.

🔺

The Inverted Test Pyramid

Symptom: The CI suite takes 35 minutes and fails intermittently due to browser driver issues, network timeouts, and timing problems. Cause: The suite is dominated by E2E tests and has few unit or integration tests — the pyramid is inverted. Fix: Invest in a strong unit test base. E2E tests should cover critical paths only, not every feature.

📊

Coverage Theatre

Symptom: 95% coverage, but production bugs slip through regularly. Teams argue about coverage percentage rather than test quality. Cause: Tests exercise code paths without making meaningful assertions — coverage measures lines touched, not correctness verified. Fix: Audit assertion quality. A test that calls a function without asserting the output is noise, not signal.

🎭

Testing Against Mocks Instead of Reality

Symptom: All tests pass in CI but the application breaks on staging when it connects to a real database. Cause: Integration tests run against in-memory SQLite or hand-rolled mocks rather than real PostgreSQL. The mocks do not replicate constraint behaviour, query planner differences, or connection handling. Fix: Service containers with real database engines in CI, as covered in Lesson 27.

Deployment Anti-Patterns

Deployment Anti-Patterns — Name, Symptom, and Fix

🏔️

The Big Bang Release

Symptom: Monthly deployments with 80+ changes. Release nights are high-stress events. When something breaks, nobody knows which of the 80 changes caused it. Cause: Fear of deployment leading to infrequency, which ironically increases deployment risk further. Fix: Deploy smaller batches more often. Deployment frequency is inversely correlated with deployment risk — not positively.

🤞

Deploy and Pray

Symptom: After deploying, the team watches error dashboards nervously for 20 minutes, not sure what to look for. Incidents are discovered by users rather than monitoring. Cause: No deployment markers, no defined success metrics, no automated post-deployment checks. Fix: Smoke tests, deployment markers, and defined metric thresholds that auto-alert when breached.

🎭

Staging Theatre

Symptom: "It passed staging" is used to explain why the production incident was a surprise. The staging environment uses SQLite, mocked external services, and runs on a single container. Cause: Staging is structurally different from production — passing staging provides false confidence. Fix: Minimise staging/production differences. Use the same database engine, same infrastructure configuration, same external service sandbox accounts.

🚪

The Paper Rollback

Symptom: A production incident requires rollback. The team realises nobody has ever tested the rollback procedure. The previous artifact was cleaned up by the retention policy. The rollback takes 4 hours. Cause: Rollback exists as a document, not a rehearsed procedure. Fix: Test rollback on a schedule. Keep production-deployed artifacts for at least 30 days. A rollback that has been rehearsed is fast; one that has not is chaos.

Security and Organisational Anti-Patterns

🔑

The Permanent Master Key

Symptom: One AWS access key is used by all pipelines across all environments. When it is compromised, every service is simultaneously at risk. When it expires, every pipeline breaks at once. Cause: Long-lived static credentials shared broadly. Fix: OIDC for cloud auth, separate IAM roles per pipeline, least-privilege scoping per job.

🏢

CI/CD as an Operations Silo

Symptom: One person or one team owns "the pipeline." Developers submit deployment tickets. Changes to the pipeline require a request and a wait. Cause: CI/CD treated as infrastructure rather than a shared engineering discipline. Fix: Pipeline code lives in the application repository. Developers own their pipelines. Platform teams provide templates, not gatekeeping.

🛒

Tooling Without Culture

Symptom: The organisation buys a CI/CD platform, Kubernetes, feature flag tooling, and a deployment dashboard. Deployment frequency does not change. Fear of production persists. Cause: DevOps culture — shared ownership, psychological safety, blameless postmortems — was not adopted alongside the tools. Fix: Tools enable culture; they do not create it. Address the cultural prerequisites first.

🚫

The Emergency Bypass

Symptom: Every team has an undocumented way to skip the pipeline in an emergency. Over time, more things become emergencies. The bypass is used more than the pipeline. Cause: The pipeline is too slow or too painful for emergency response — so a bypass was created. Fix: Make the pipeline fast enough that a bypass is never justified. An emergency hotfix pipeline that takes 3 minutes and skips only E2E tests — not security scans, not smoke tests — is a legitimate fast path that does not undermine the safety model.

Warning: Anti-Patterns Are Easier to Prevent Than to Remove

Every anti-pattern on this list started as a decision that made sense given the constraints of the moment. The shared integration environment was built because someone needed cross-service testing. The bypass was created because a deployment was genuinely urgent. The duplicated YAML was copy-pasted because there was no time to build a reusable workflow. Anti-patterns become dangerous when they stop being temporary solutions and become permanent architecture. The moment a workaround is built, the team should schedule its removal. The moment a bypass is created, the team should schedule a sprint to make it unnecessary. Anti-patterns that are named and tracked are managed; anti-patterns that are accepted as "just how we do things" become load-bearing assumptions that cost ten times as much to remove later.

Key Takeaways from This Lesson

✓

Anti-patterns emerge from good intentions — a shared environment for better coverage, a bypass for faster emergencies, a long release cycle to reduce risk. Understanding the intent behind each anti-pattern is what allows teams to address the underlying need without the destructive side effect.

✓

Slow pipelines and flaky tests are the two most destructive anti-patterns — both train developers to distrust or bypass the pipeline, which undermines every other quality and safety control the pipeline provides.

✓

Staging theatre and the paper rollback are the most dangerous deployment anti-patterns — both create false confidence. A staging environment that does not resemble production and a rollback procedure that has never been tested are security blankets, not safety nets.

✓

Tools without culture produce tooling anti-patterns — buying a CI/CD platform does not produce DevOps culture. Shared ownership, blameless postmortems, and psychological safety are the prerequisites; tools are the enablers.

✓

Anti-patterns must be named, tracked, and scheduled for removal — a workaround that is not tracked will become permanent. The moment an anti-pattern is identified, create a ticket to remove it. The moment it is accepted as "just how things work," it becomes ten times more expensive to change.

Teacher's Note

Walk through this lesson's anti-pattern list with your team and mark every one you currently have — then rank them by cost and fix the top two in the next sprint. Naming a problem in a meeting is worth less than fixing one in a pull request.

Practice Questions

Answer in your own words — then check against the expected answer.

1. What is the name of the anti-pattern where a staging environment is structurally different from production — using a different database engine, mocked external services, or single-container infrastructure — giving the team false confidence that passing staging means the release is safe for production?

2. What testing anti-pattern describes a test suite dominated by slow, brittle end-to-end tests with very few unit or integration tests — producing a pipeline that is slow, flaky, and provides poor signal about where failures originate?

3. What deployment anti-pattern accumulates weeks or months of changes into a single large release — where the deployment risk grows with each added change, rollback identification becomes nearly impossible, and release events are high-stress ceremonies rather than routine operations?

Lesson Quiz

1. A team has 94% test coverage and a nearly always-green CI pipeline, but ships production bugs at the same rate as before they introduced testing. A consultant reviews the tests and finds they call functions without asserting return values. Which anti-pattern is this?

The 45-minute pipeline — coverage above 90% always causes slow pipelines due to the overhead of tracking execution paths The flaky test graveyard — high coverage produces many opportunities for timing-dependent test failures The vanity pipeline — a high coverage percentage with weak assertions means tests exercise code without verifying correctness, producing a green pipeline that provides false assurance while real bugs pass through Coverage theatre is acceptable — 94% is sufficiently high that the remaining 6% is unlikely to contain meaningful bugs

2. A development team must submit a deployment ticket to the platform team and wait up to 48 hours for their changes to reach production. The pipeline is owned and maintained exclusively by the platform team. Which anti-pattern does this describe?

The emergency bypass — the 48-hour wait creates pressure to bypass the pipeline for urgent changes CI/CD as an operations silo — when one team owns the pipeline and developers must submit tickets to deploy, developers lose ownership of their delivery process, the pipeline becomes a bottleneck, and changes to it require bureaucratic coordination The vanity pipeline — a 48-hour pipeline is too slow to provide meaningful feedback, making it effectively ceremonial rather than functional This is correct practice — centralised pipeline ownership ensures consistency and prevents individual teams from introducing security vulnerabilities

3. A team creates an undocumented emergency deployment bypass that skips all pipeline stages. Initially used once a month, it is now used multiple times per week because "everything is urgent." What long-term consequence does this illustrate?

The team has become more productive — faster deployment paths improve delivery frequency regardless of pipeline bypass The emergency bypass becomes the normal path — over time, more situations become emergencies, the bypass is used more than the proper pipeline, and the safety controls the pipeline provides are bypassed routinely rather than exceptionally The pipeline will become obsolete — teams that routinely bypass pipelines eventually abandon them entirely in favour of manual deployment processes The bypass is acceptable — emergency paths are a necessary feature of any mature CI/CD system and their frequent use indicates a responsive team

Up Next · Lesson 40

Mini Project

The final lesson — apply everything from this course to a complete, realistic CI/CD scenario. Build the pipeline, handle the edge cases, and ship it.

← Previous Course Index Next →