CI/CD Lesson 27 – CI/CD With Containers | Dataplexa
Section III · Lesson 27

CI/CD with Containers

In this lesson

Containers & the Pipeline Container Images as Artifacts Service Containers Container Security in CI Multi-Stage Builds

Containers are the packaging format that changed both how software runs and how CI/CD pipelines are structured. A container bundles an application with everything it needs to run — runtime, dependencies, configuration — into a portable, isolated unit that behaves identically on a developer's laptop, a CI runner, and a production server. For CI/CD, containers are simultaneously the most common artifact format, a tool for running isolated pipeline steps, a mechanism for spinning up test dependencies, and the unit of deployment in modern infrastructure. Understanding how containers fit into every layer of the pipeline is essential for working in any contemporary engineering organisation.

Containers in Three Pipeline Roles

Containers appear in CI/CD pipelines in three distinct roles, each serving a different purpose. Conflating these roles produces confusion about what a container is doing at any given point in the pipeline.

Three Pipeline Roles for Containers

📦
The artifact — what gets built and deployed
The build stage produces a Docker image: a layered, versioned, immutable package of the application. This image is pushed to a registry, pulled by the test stage, promoted through environments, and ultimately deployed to production. The image is the artifact — everything covered in Lesson 14 applies directly.
🔧
The runner environment — what pipeline steps run inside
GitHub Actions jobs can declare a container: key, causing all steps in that job to run inside a specified Docker image rather than directly on the runner's operating system. This makes the build environment explicit and reproducible — the same container image used locally can be specified in the pipeline, eliminating "works on my machine" environment differences.
🗄️
The service dependency — what integration tests run against
GitHub Actions supports services: containers — Docker images that run as sidecar processes alongside the job, providing real database, cache, or message queue instances for integration tests. A PostgreSQL container running in the pipeline is indistinguishable to the application from a real PostgreSQL server, making integration tests genuine rather than mocked.

The Shipping Container Analogy

Before standardised shipping containers, every cargo shipment required custom handling at every port — different cranes, different storage, different documentation. The standardised container changed logistics permanently: it loads the same way onto a truck, a ship, or a train, and every port handles it identically. Software containers do the same for application deployment. The same image runs on a developer's Mac, a Linux CI runner, and a Kubernetes cluster in GCP — the infrastructure layer becomes interchangeable because the container provides a consistent interface at every stage.

Multi-Stage Builds — Lean Production Images

A naive Docker build includes everything needed to compile the application in the final image — build tools, compilers, test frameworks, development dependencies. This produces images that are large, slow to pull, and carry a much larger attack surface than necessary. Multi-stage builds solve this by separating the build environment from the runtime environment within a single Dockerfile.

The first stage installs all build dependencies and compiles the application. The second stage starts from a minimal base image and copies only the compiled output from the first stage — no build tools, no source code, no development dependencies. A Node.js application that requires 800MB of node_modules to build might produce a runtime image under 100MB. A Go binary requires nothing beyond the compiled executable and a minimal OS layer. Smaller images mean faster pulls, faster deployment, and a smaller CVE surface area.

Multi-Stage Dockerfile and Pipeline Integration

# Stage 1 — builder: full Node.js environment with all dev dependencies
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci                              # Install all dependencies including devDependencies
COPY src/ ./src/
RUN npm run build                       # Compile TypeScript, bundle, optimise

# Stage 2 — runtime: minimal image with only what production needs
FROM node:20-alpine AS runtime
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev                   # Production dependencies only — no build tools
COPY --from=builder /app/dist ./dist    # Copy compiled output from builder stage only
# Source code and devDependencies never reach this image
USER node                               # Run as non-root for security
EXPOSE 3000
CMD ["node", "dist/server.js"]
# GitHub Actions — build, scan, and push the multi-stage image
jobs:
  build-and-push:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    steps:
      - uses: actions/checkout@v4

      - name: Build image (multi-stage)
        run: |
          docker build \
            --target runtime \                          # Build only the runtime stage
            -t ghcr.io/myorg/app:${{ github.sha }} .

      - name: Scan image for vulnerabilities
        uses: aquasecurity/trivy-action@v0.16.0         # Scan before pushing
        with:
          image-ref: ghcr.io/myorg/app:${{ github.sha }}
          exit-code: '1'                                # Fail pipeline on critical CVEs
          severity: 'CRITICAL,HIGH'

      - name: Push to registry
        run: docker push ghcr.io/myorg/app:${{ github.sha }}

What just happened?

The Dockerfile uses two stages — the builder stage compiles the application, and the runtime stage copies only the compiled output into a minimal image. The pipeline builds only the runtime stage, scans the resulting image for known CVEs before it reaches the registry, and fails the pipeline on any critical or high-severity finding. A clean, small, scanned image is what gets pushed — nothing else.

Service Containers — Real Dependencies for Integration Tests

Integration tests that run against mocked databases or in-memory substitutes test the application code in isolation from the dependencies it actually uses in production. Service containers in GitHub Actions spin up real dependency instances — PostgreSQL, Redis, RabbitMQ — as Docker containers that run alongside the job and are destroyed when it completes. The application connects to them exactly as it would in production, using a real TCP connection to a real database engine.

Integration Tests with Service Containers — GitHub Actions

jobs:
  integration-tests:
    runs-on: ubuntu-latest

    services:
      postgres:                                         # Real PostgreSQL instance
        image: postgres:16
        env:
          POSTGRES_USER: testuser
          POSTGRES_PASSWORD: testpass
          POSTGRES_DB: testdb
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5                            # Wait until PostgreSQL is ready

      redis:                                            # Real Redis instance
        image: redis:7-alpine
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s

    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - name: Run integration tests
        env:
          DATABASE_URL: postgresql://testuser:testpass@localhost:5432/testdb
          REDIS_URL: redis://localhost:6379
        run: npm run test:integration

What just happened?

Two real service containers — PostgreSQL 16 and Redis 7 — started alongside the job. Health checks ensure the pipeline waits for both to be ready before running the tests. The application connects to them using real connection strings on localhost, running genuine SQL queries against a real database engine and real cache commands against a real Redis instance. No mocks, no in-memory substitutes — the integration tests are actually integrated.

Container Security in the CI Pipeline

Every Docker image pushed to a registry is a potential production artifact — it deserves security scrutiny before it leaves the pipeline. Container image scanning analyses the layers of an image for known CVEs in installed packages, deprecated base images, and misconfigurations before the image reaches any deployment environment. Tools like Trivy, Grype, and Snyk Container integrate directly into GitHub Actions as pipeline steps.

Beyond vulnerability scanning, container security in the pipeline includes several other practices: always run containers as a non-root user (the USER directive in the Dockerfile), use minimal base images like alpine or distroless to reduce the installed package surface, never embed secrets in image layers (they persist in the layer history even if deleted in a later layer), and pin base image versions to a digest rather than a mutable tag to prevent silent base image changes between builds.

Warning: Secrets Embedded in Image Layers Are Permanently Exposed

A Dockerfile that runs RUN curl -H "Authorization: $API_KEY" ... embeds the API key value into the image layer — even if a subsequent RUN command deletes it, the key exists in the layer history and is recoverable by anyone who can pull the image. Build arguments passed with --build-arg appear in the image metadata. The only safe pattern for credentials during a Docker build is to use multi-stage builds where the secret is used in the builder stage and never copied to the runtime stage, or to use Docker BuildKit's secret mounts (RUN --mount=type=secret) which are never written to any layer. Never use environment variables or build args to pass secrets into Dockerfile RUN commands.

Key Takeaways from This Lesson

Containers serve three distinct pipeline roles — as the artifact that gets built and deployed, as the environment that pipeline steps run inside, and as service dependencies that integration tests run against. Each role requires different pipeline configuration.
Multi-stage builds produce lean, secure production images — separating the build environment from the runtime environment means build tools, source code, and development dependencies never reach the image that runs in production.
Service containers give integration tests real dependencies — a PostgreSQL container running in the pipeline is a real database engine. Tests that connect to it are genuinely integrated, catching failures that mocked dependencies would miss entirely.
Scan images before pushing to the registry — a CVE found in the pipeline costs nothing to fix. A CVE found in a running production container costs an emergency patching cycle, a re-scan, a re-deploy, and potential regulatory reporting.
Secrets must never be embedded in image layers — they persist in layer history regardless of subsequent deletion and are recoverable by anyone who can pull the image. Use BuildKit secret mounts or multi-stage isolation instead.

Teacher's Note

Run docker history --no-trunc your-image:tag on any existing production image and read every layer command — if you see an API key, a password, or any credential in that output, it is in the image and must be treated as compromised immediately.

Practice Questions

Answer in your own words — then check against the expected answer.

1. What Dockerfile technique uses multiple FROM statements to separate the build environment from the runtime environment — ensuring that compilers, build tools, and development dependencies never appear in the final production image?



2. What GitHub Actions feature runs Docker images as sidecar processes alongside a pipeline job — providing real PostgreSQL, Redis, or RabbitMQ instances that integration tests can connect to exactly as they would in production?



3. What is the name of the open-source container image scanning tool — integrated directly into GitHub Actions via an official action — that analyses image layers for known CVEs and can fail the pipeline on critical or high-severity findings before the image reaches the registry?



Lesson Quiz

1. A Dockerfile uses RUN curl -H "Authorization: $API_KEY" https://internal-api/config and then immediately runs RUN unset API_KEY. A security reviewer flags this. What is the problem?


2. A team's integration tests currently use an in-memory SQLite mock instead of PostgreSQL. They switch to a PostgreSQL service container in their pipeline. What category of bug does this change allow the pipeline to catch that it previously could not?


3. A Node.js application requires 600MB of node_modules including TypeScript and webpack to build, but only 80MB of production dependencies to run. A multi-stage Dockerfile is introduced. What determines the size of the final production image?


Up Next · Lesson 28

CI/CD with Docker

Lesson 27 introduced containers in the pipeline. Lesson 28 goes deeper into Docker specifically — Dockerfile best practices, layer caching strategy, BuildKit, and the full build-tag-push workflow in GitHub Actions.