CI/CD Lesson 36 – Infrastructure as code in CI/CD | Dataplexa
Section IV · Lesson 36

Infrastructure as Code in CI/CD

In this lesson

IaC Fundamentals Terraform in the Pipeline Plan, Apply, Destroy State Management IaC Security

Infrastructure as Code (IaC) is the practice of defining, provisioning, and managing cloud infrastructure — servers, networks, databases, load balancers, DNS records, IAM roles — through version-controlled code files rather than through a web console or manual CLI commands. When infrastructure is defined as code, it gains all the properties of any other software artefact: it can be reviewed, tested, versioned, rolled back, and reproduced exactly. A team that provisions infrastructure through a CI/CD pipeline — running Terraform, Pulumi, or AWS CDK as pipeline steps — applies the same discipline to their environments that they apply to their applications, ensuring that no infrastructure change reaches production without going through the same review and verification process as a code change.

IaC and CI/CD — Two Systems That Belong Together

The application pipeline and the infrastructure pipeline are often treated as separate concerns — the application team owns the CI/CD pipeline, and a separate operations or platform team manages infrastructure through a different process. This separation creates an environment parity gap: the application is deployed to environments that may differ from what the code was tested against, because infrastructure changes are not subject to the same review and promotion process as application changes.

Combining IaC with CI/CD closes this gap. Infrastructure changes go through a pull request, are reviewed, run a terraform plan to preview the effect, and are applied through an automated pipeline after approval — just like application code. The same environment that staging runs on, and the same environment that production runs on, are both defined in version-controlled Terraform — with the only differences being scale parameters and environment-specific values, managed through Terraform variables. This is what makes environment parity (as covered in Lesson 19) achievable in practice.

The Architect's Blueprint Analogy

A building constructed without blueprints depends entirely on the memory and skill of the workers who built it — nobody else can reproduce it, nobody can safely modify it, and when something goes wrong there is no reference for what it was supposed to look like. Infrastructure managed through a web console has exactly this property: it exists, it runs, but the knowledge of how it was configured lives in someone's head or in a screenshot folder. IaC is the blueprint: every component is explicitly specified, the specification is version-controlled, and anyone with access to the repository can understand exactly how the infrastructure is built and reproduce it from scratch.

Terraform — Plan Before Apply

Terraform is the most widely adopted IaC tool. It defines infrastructure resources in HCL (HashiCorp Configuration Language) files, maintains a state file that tracks the current state of provisioned resources, and produces a plan — a diff between the desired state in code and the actual state in the cloud — before making any changes. The plan is the safety mechanism that makes infrastructure changes reviewable: an engineer can see exactly what will be created, modified, or destroyed before a single API call is made.

In a CI/CD pipeline, Terraform runs in two phases. terraform plan runs on every pull request, generating the plan as a comment on the PR so reviewers can see the infrastructure impact of the code change alongside the code change itself. terraform apply runs after the PR is approved and merged, applying the planned changes to the target environment. This split ensures that infrastructure changes are never applied without a human having reviewed the plan — even in a fully automated pipeline.

Terraform CI/CD Pipeline — GitHub Actions

on:
  pull_request:
    paths: ['infrastructure/**']      # Only trigger on infrastructure code changes
  push:
    branches: [main]
    paths: ['infrastructure/**']

jobs:
  plan:
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    permissions:
      contents: read
      id-token: write                  # For OIDC auth to AWS
      pull-requests: write             # To post the plan as a PR comment

    steps:
      - uses: actions/checkout@v4

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.TF_PLAN_ROLE }}  # Read-only role for planning
          aws-region: eu-west-1

      - uses: hashicorp/setup-terraform@v3

      - name: Terraform init
        run: terraform init
        working-directory: infrastructure/

      - name: Terraform plan
        id: plan
        run: terraform plan -no-color -out=tfplan
        working-directory: infrastructure/

      - name: Post plan to PR
        uses: actions/github-script@v7
        with:
          script: |
            const plan = `${{ steps.plan.outputs.stdout }}`;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## Terraform Plan\n\`\`\`\n${plan}\n\`\`\``
            });

  apply:
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production            # Requires manual approval before apply
    permissions:
      contents: read
      id-token: write

    steps:
      - uses: actions/checkout@v4

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.TF_APPLY_ROLE }}  # Write role — only for apply
          aws-region: eu-west-1

      - uses: hashicorp/setup-terraform@v3

      - run: terraform init
        working-directory: infrastructure/

      - run: terraform apply -auto-approve
        working-directory: infrastructure/

What just happened?

Two jobs handle the two phases of the Terraform workflow. The plan job runs on PRs — it uses a read-only IAM role, generates the plan, and posts it as a PR comment so reviewers can see the infrastructure impact. The apply job runs only after merge to main and uses a more privileged write role — but only after a manual approval through the GitHub environment protection gate. The split between read-only planning and write-capable applying follows least privilege: the pipeline holds elevated permissions only for the shortest possible window.

State Management — The Source of Truth for Infrastructure

Terraform tracks the current state of provisioned infrastructure in a state file — a JSON record of every resource Terraform manages, its current configuration, and its cloud provider IDs. The state file is what enables Terraform to compute diffs and know what to create, update, or destroy. In a CI/CD pipeline where multiple jobs may run concurrently, the state file must be stored in a remote backend — an S3 bucket with DynamoDB locking, Terraform Cloud, or equivalent — so that all pipeline runs access the same state and concurrent runs are serialised through a lock mechanism.

State file security is critical. The state file contains the full configuration of every managed resource — including sensitive values like generated passwords, private key material, and database connection strings that Terraform writes into state after provisioning. The state backend must enforce encryption at rest, strict access control (only the pipeline's IAM role should be able to read and write it), and versioning so that a corrupted state file can be recovered from a previous version. Never commit the state file to version control.

IaC Security — Policy as Code and Drift Detection

IaC pipelines introduce a new category of security concern: infrastructure misconfigurations that reach production through the automated apply pipeline. A Terraform module that opens an S3 bucket to public access, removes an important security group rule, or creates an IAM role with excessive permissions can cause a security incident just as surely as a vulnerable application dependency. Policy as Code tools — Checkov, tfsec, Trivy (for IaC) — scan Terraform plans and configurations for known misconfigurations before apply, failing the pipeline on critical findings the same way application security scanners fail on high-severity CVEs.

Drift detection is the practice of regularly comparing the actual state of cloud infrastructure against the declared state in Terraform code, and alerting when they diverge. Drift occurs when someone makes a manual change through the AWS console or CLI — the type of change that IaC is designed to prevent. A scheduled pipeline job that runs terraform plan nightly and alerts if the plan is non-empty is the simplest form of drift detection. More mature approaches use dedicated drift detection tools that continuously monitor the infrastructure and block manual changes through AWS SCPs or equivalent guardrails.

Warning: Running terraform apply Without a Plan Review Is Equivalent to Deploying Without Code Review

A pipeline that runs terraform apply directly on merge — without generating a plan on the PR and requiring a human to review it — skips the one step that makes infrastructure changes safe. A refactoring of a Terraform module that looks clean in a code review can still produce a plan that destroys and recreates a production database. The code review cannot catch this; only the plan can. Always run terraform plan on the PR, post it as a comment, and require the reviewer to explicitly confirm the plan is safe before approving the merge. The apply should never surprise the reviewer — they should already know exactly what is about to happen.

Key Takeaways from This Lesson

IaC brings the same discipline to environments that CI/CD brings to applications — version control, code review, automated verification, and reproducible provisioning replace manual console clicks with a traceable, auditable process.
Plan on PR, apply on mergeterraform plan runs on every pull request and posts the diff to the PR for review. terraform apply runs only after approval and merge, using a more privileged role scoped to the apply operation alone.
Remote state with locking is non-negotiable in CI/CD — concurrent pipeline runs against local state produce corruption. S3 with DynamoDB locking, or Terraform Cloud, serialises access and prevents conflicts.
Policy as Code scans Terraform before apply — tools like Checkov and tfsec detect infrastructure misconfigurations in the pipeline before they reach the cloud, catching open S3 buckets and excessive IAM permissions the same way SAST catches vulnerable code patterns.
Drift detection catches manual infrastructure changes — a scheduled plan that alerts on non-empty output detects console-made changes that bypass the IaC pipeline, maintaining the integrity of infrastructure-as-code as the single source of truth.

Teacher's Note

The first time a Terraform plan on a PR shows that a refactor would destroy and recreate a production RDS instance, the team will never again merge an infrastructure PR without reading the plan — that single experience is worth more than any amount of process documentation.

Practice Questions

Answer in your own words — then check against the expected answer.

1. What is the practice of regularly comparing the actual state of cloud infrastructure against the declared state in IaC code, and alerting when they diverge — typically implemented as a scheduled pipeline job that runs terraform plan and alerts if the output is non-empty?



2. What Terraform feature — implemented using an S3 bucket with DynamoDB locking, or Terraform Cloud — stores the state file centrally and serialises concurrent pipeline runs through a lock mechanism, preventing state corruption when multiple jobs run simultaneously?



3. What is the name of the open-source Policy as Code tool — often run as a pipeline step before terraform apply — that scans Terraform configurations and plans for known infrastructure misconfigurations such as publicly accessible S3 buckets or overly permissive IAM roles?



Lesson Quiz

1. A team refactors a Terraform module. The code change looks clean in review — no resources are obviously being removed. After merging and applying, their production database is destroyed and recreated, causing 45 minutes of downtime. What pipeline step would have prevented this?


2. A security review of a Terraform pipeline finds that the state file is stored in an S3 bucket with public read access. Why is this a critical security finding beyond just being bad practice?


3. A scheduled nightly pipeline runs terraform plan against an environment where no Terraform code has changed. The plan output shows that a security group rule will be added. What does this indicate?


Up Next · Lesson 37

CI/CD in Cloud Environments

Most CI/CD pipelines deploy to cloud infrastructure. Lesson 37 covers the cloud-specific patterns, services, and authentication models that shape how pipelines interact with AWS, GCP, and Azure.