CI/CD Lesson 22 – Pipeline as Code | Dataplexa

Section III · Lesson 22

Pipeline as Code

In this lesson

PaC Fundamentals GitHub Actions Anatomy Triggers & Contexts Composite Actions Pipeline Quality

Pipeline as Code (PaC) is the practice of defining CI/CD pipeline configuration entirely in version-controlled files — rather than through a graphical UI or a database-backed configuration system — so that the pipeline is subject to the same engineering discipline as the application it builds. Every change to the pipeline is a code change: it goes through a pull request, gets reviewed, is tracked in Git history, and can be rolled back. A pipeline defined in a UI is invisible to version control, unreviewed, and unauditable. A pipeline defined as code is a first-class software artefact.

Pipeline as Code vs UI-Based Configuration

Older CI platforms — notably early Jenkins — defined pipelines through web interfaces. A team member would click through forms to configure build steps, set up triggers, and define post-build actions. This worked at small scale but created serious problems as pipelines grew: changes were made without review, there was no history of who changed what and when, the configuration could not be tested before deployment, and recreating a pipeline after a server failure required manual recreation from memory or documentation.

Modern platforms — GitHub Actions, GitLab CI, CircleCI, Bitbucket Pipelines — define pipelines as YAML files committed to the repository. The pipeline lives alongside the code it builds. A PR that changes the application can also change the pipeline in the same commit. The pipeline's history is Git's history. This is not a cosmetic improvement; it is a fundamental shift in how the delivery system is governed and maintained.

The Building Permit Analogy

A building constructed without permits, inspections, or recorded drawings is a legal and structural liability — nobody can verify it was built correctly, and changes to it carry unknown risks. A building with full documentation, inspections at every stage, and a registered set of plans is auditable and maintainable. A UI-configured pipeline is the building without permits. A pipeline-as-code definition is the documented, inspected, registered structure — every change reviewed, every modification recorded, every version recoverable.

GitHub Actions Anatomy — The Structure of a Workflow File

A GitHub Actions workflow is a YAML file stored in .github/workflows/. Understanding its structure is essential for writing pipelines that are readable, maintainable, and correct. Every workflow has four top-level concerns: when it runs, what it runs on, what permissions it has, and what jobs it contains.

Annotated Workflow File — Every Key Explained

name: CI Pipeline                        # Display name shown in GitHub Actions UI

on:                                      # Trigger definition — when does this workflow run?
  push:
    branches: [main]                     # On every push to main
  pull_request:
    branches: [main]                     # On PRs targeting main
    paths:
      - 'src/**'                         # Only if files in src/ changed (path filtering)
  workflow_dispatch:                     # Allow manual trigger from the GitHub UI

permissions:                             # Minimum permissions for the entire workflow
  contents: read                         # Read the repo — no write access needed
  packages: write                        # Write to GitHub Container Registry

env:                                     # Workflow-level environment variables
  NODE_VERSION: '20'
  REGISTRY: ghcr.io

jobs:
  build:                                 # Job ID — referenced by other jobs in needs:
    name: Build Application              # Human-readable job name in the UI
    runs-on: ubuntu-latest               # Runner type — GitHub-hosted Ubuntu

    outputs:                             # Values this job exposes to downstream jobs
      image-tag: ${{ steps.meta.outputs.tags }}

    steps:
      - name: Check out source
        uses: actions/checkout@v4        # Always pin actions to a specific version

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'

      - name: Install dependencies
        run: npm ci                      # Clean install — respects lock file

      - name: Build
        run: npm run build

      - name: Set image metadata
        id: meta                         # Step ID — used to reference outputs
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ github.repository }}
          tags: |
            type=sha                     # Tag with commit SHA for traceability

What just happened?

Every structural element of a workflow file is visible and purposeful: triggers define exactly when the pipeline runs and for which paths, permissions are declared minimally at the workflow level, environment variables are centralised rather than scattered, job outputs carry the image tag to downstream jobs, and every action is pinned to a version tag for reproducibility.

Triggers and Contexts — Controlling When and With What

The on: block is one of the most powerful and most misused parts of a workflow file. Overly broad triggers — running on every push to every branch — produce unnecessary pipeline runs, waste runner minutes, and slow down the feedback teams actually care about. Precise triggers produce a pipeline that runs exactly when needed and nowhere else.

Common Trigger Patterns and Their Use Cases

on: push (main)

Triggers the deploy pipeline after a merge. Used for the full build-test-deploy sequence that moves code toward production.

on: pull_request

Triggers the CI quality gate on PR open, push, and reopen. Used for build, test, lint, and security checks that gate the merge.

on: schedule

Triggers on a cron expression. Used for nightly security audits, dependency scans, and performance benchmarks that are too slow or noisy for PR runs.

on: workflow_dispatch

Enables manual triggering from the GitHub UI or API. Used for on-demand deployments, rollbacks, and operational tasks that should not run automatically.

on: workflow_call

Makes this workflow callable as a reusable workflow from other workflows, as covered in Lesson 21. Accepts inputs and secrets from the caller.

Contexts are the runtime data available to workflow steps — information about the trigger event, the repository, the runner, and the job. The github context exposes the commit SHA, the branch name, the actor who triggered the run, and the event type. The secrets context exposes encrypted secret values. The env context exposes environment variables. Understanding contexts is what separates a pipeline that works for one scenario from one that adapts correctly to every scenario it was designed for.

Composite Actions — Reusable Step Groups

Composite actions are a lighter-weight alternative to reusable workflows — they group a sequence of steps into a named, callable unit that can be referenced from any workflow in the same repository or from other repositories. Where reusable workflows are full jobs, composite actions are step groups: they run within the calling job's runner rather than on their own runner, and they do not require secrets to be explicitly passed.

A common use case is grouping the standard setup steps — checkout, setup-node, cache restore, clean install — into a composite action called setup-environment. Every job that needs those four steps calls the composite action in one line rather than repeating all four steps. When the Node version changes, one file changes. When a caching strategy improves, one file improves. The rest of the workflow files are untouched.

Pipeline Quality — Treating the Pipeline Like Production Code

If pipeline code is treated as a first-class software artefact, it deserves the same quality practices applied to application code. This means several things in practice: pipeline changes go through code review just like application code; YAML linting tools like actionlint run in CI to catch syntax errors and logic problems before they reach the main branch; and pipelines are tested — by running them in a branch environment before merging — rather than discovered to be broken only when they land on main.

Pipeline Quality Checklist

Pin action versions

Use actions/checkout@v4, not actions/checkout@main. A floating tag can be updated by the action author and change behaviour without your knowledge. Pinning to a version or SHA ensures reproducibility.

Declare minimal permissions

Set permissions: contents: read at the workflow level and grant additional permissions only to the specific jobs that need them. The default GitHub Actions token has more permissions than most jobs require.

Lint pipeline YAML

Run actionlint in CI to catch type errors, missing required inputs, and invalid expressions before the pipeline reaches main. A broken pipeline discovered on main blocks every subsequent deployment.

Avoid hardcoded values

Tool versions, registry URLs, environment names, and thresholds belong in env: blocks or workflow inputs — not scattered as hardcoded strings throughout the steps. A hardcoded value that appears in eight places requires eight changes.

Review pipeline PRs seriously

Pipeline changes carry security and reliability risk equal to application changes. A PR that adds a new run: step with curl | bash deserves the same scrutiny as a PR that adds a new payment endpoint.

Warning: Using Floating Action Tags Is a Supply Chain Vulnerability

Referencing a GitHub Action with a mutable tag — uses: some-org/some-action@main or uses: some-org/some-action@v1 — means the action author can push new code to that tag and change what runs in your pipeline without any change to your workflow file. This is a software supply chain attack vector: a compromised action repository can exfiltrate secrets, modify build outputs, or inject malicious code into your artifacts. Pin all third-party actions to a specific commit SHA — uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 — or at minimum to an immutable version tag where the maintainer enforces tag immutability.

Key Takeaways from This Lesson

✓

Pipeline as Code makes the delivery system auditable and maintainable — every pipeline change goes through version control, gets reviewed, and can be rolled back. A UI-configured pipeline is invisible to all three.

✓

Precise triggers keep pipelines fast and purposeful — running on every push to every branch wastes runner time and creates noise. The on: block should be as specific as the pipeline's actual purpose requires.

✓

Composite actions group repeated step sequences into reusable units — setup steps that appear in every job belong in a composite action. One file to update, every job benefits automatically.

✓

Pin all third-party actions to a version or commit SHA — floating tags like @main allow action authors to change what runs in your pipeline without any change to your workflow file, creating a supply chain attack surface.

✓

Pipeline changes deserve the same review rigour as application changes — a new run: step in a workflow file has the same security and reliability implications as new application code. Review it accordingly.

Teacher's Note

Run actionlint on your workflow files before you push — it catches type mismatches, undefined expressions, and missing inputs that GitHub Actions will silently mishandle at runtime, often in ways that are hard to debug from logs alone.

Practice Questions

Answer in your own words — then check against the expected answer.

1. What GitHub Actions feature groups a sequence of steps — such as checkout, setup-node, cache restore, and clean install — into a named, callable unit that runs within the calling job's runner and can be referenced from any workflow in the same or another repository?

2. What is the name of the static analysis tool that lints GitHub Actions workflow YAML files — catching type errors, missing required inputs, and invalid expressions before the pipeline reaches the main branch?

3. What GitHub Actions trigger type enables a workflow to be run manually from the GitHub UI or API — used for on-demand deployments, rollbacks, and operational tasks that should not run automatically on push or PR events?

Lesson Quiz

Up Next · Lesson 23

Secrets Management in CI/CD

Pipelines need credentials to do their work — API keys, deployment tokens, database passwords. Secrets management is the discipline of supplying those credentials safely, without ever exposing them in logs, files, or version control.

← Previous Course Index Next →

CI/CD Course