Docker Lesson 40 – Docker Performance Optimization | Dataplexa

Section IV · Lesson 40

Docker Performance Optimization

A team's CI pipeline ran Docker builds for eleven minutes per commit. Fifteen developers pushing multiple times a day meant the build queue was the bottleneck for the entire engineering organisation. Nobody was waiting for tests. Nobody was waiting for code review. Everyone was waiting for Docker. The builds were not doing eleven minutes of work — they were doing three minutes of work and spending eight minutes downloading the same npm packages on every single run because one line in the Dockerfile was in the wrong order.

Docker performance optimization is almost entirely about understanding the layer cache — what busts it, what preserves it, and how to structure every Dockerfile so that the work that changes rarely sits at the top and the work that changes constantly sits at the bottom. The other techniques — multi-stage builds, image size reduction, BuildKit parallelism — compound these gains. This lesson covers each of them with before-and-after numbers.

Slow vs Fast Builds

Unoptimized build — 11 minutes

Source code copied before dependency install — cache busts on every change
Dev dependencies bundled into production image
No .dockerignore — gigabytes sent to the Daemon every build
Single-stage build — compiler and test tooling in the final image
BuildKit disabled — no parallel stage execution
Base image pulled fresh on every CI run
1.4 GB final image — forty seconds to pull on every deploy

Optimized build — 90 seconds

Dependency manifests copied first — cache survives code-only changes
Production dependencies only in final image
.dockerignore trims build context to under 3 MB
Multi-stage build — tooling discarded, runtime only in final image
BuildKit enabled — stages run in parallel where possible
Registry cache used — layers reused across CI runs
94 MB final image — pulls in under 4 seconds

The Archaeology Layer Analogy

An archaeologist excavating a dig site works from the top down — disturbing the surface layer exposes only what's beneath it, leaving deeper layers untouched. Docker's layer cache works the same way but in reverse: disturbing a layer invalidates everything below it, leaving everything above untouched. If you change a line near the top of the Dockerfile, every subsequent layer rebuilds from scratch. If you change a line near the bottom, only the final layers rebuild. The strategy: put the stable bedrock at the top — base image, system packages, dependency installation — and put the constantly-changing topsoil at the bottom: your application source code. Dig in the right order and most of the excavation is already done.

Optimization 1 — Layer Cache Ordering

The single highest-impact change in any Dockerfile: copy dependency manifests before source code. When Docker reaches a COPY instruction, it compares the file contents against the cache. If the file changed, the cache is invalidated from that point downward. If package.json hasn't changed, npm install is served from cache in milliseconds — regardless of how many source files changed below it.

# SLOW — cache busts on every code change
FROM node:18-alpine
WORKDIR /app
COPY . .
# Copies ALL files including source code.
# Every code change → this layer rebuilds → npm install re-runs.
# npm install takes 45 seconds. It re-runs on every single commit.
RUN npm install --omit=dev
EXPOSE 3000
CMD ["node", "server.js"]

# ────────────────────────────────────────────────────────

# FAST — dependency layer cached independently of source code
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
# Copy ONLY the dependency manifest — changes rarely.
# package.json unchanged → next layer is cache hit → npm install skipped.
RUN npm install --omit=dev
# This layer is cached as long as package.json doesn't change.
# A code-only change never reaches this layer.
COPY . .
# Source code copied AFTER install. Cache for npm install is unaffected.
EXPOSE 3000
CMD ["node", "server.js"]

# SLOW — after a one-line code change:
docker build -t payment-api:slow .
[+] Building 52.3s
 => COPY . .                         ← cache MISS — any file changed
 => RUN npm install --omit=dev       ← 48.1s — downloads all packages again
 => EXPOSE 3000
Total: 52.3 seconds. Every commit. Every time.

# FAST — same one-line code change:
docker build -t payment-api:fast .
[+] Building 2.1s
 => CACHED COPY package*.json ./     ← cache HIT — package.json unchanged
 => CACHED RUN npm install           ← 0ms — served from cache
 => COPY . .                         ← 0.4s — only this layer rebuilt
 => EXPOSE 3000
Total: 2.1 seconds. 25× faster. Same output image.

What just happened?

Moving two lines changed the build from 52 seconds to 2 seconds for every code-only commit — a 25× improvement with zero change to the final image. The dependency layer is now cached independently of the source code. It rebuilds only when package.json changes — which happens when you add or remove packages, not when you fix a bug or add a feature. For a team of fifteen developers making ten commits a day, this is the difference between eleven minutes and ninety seconds of CI time per commit.

Optimization 2 — Multi-Stage Builds for Size

Every tool installed during the build process — compilers, test runners, linters, build utilities — adds to the image size if the build is single-stage. A multi-stage build uses one stage to do all the heavy build work and a second minimal stage that receives only the compiled output. The builder stage is discarded entirely — it never appears in the final image, never gets pushed to the registry, and never gets pulled during deployment.

# syntax=docker/dockerfile:1

# Stage 1 — builder: full toolchain, compiles everything
FROM node:18-alpine AS builder
WORKDIR /build
COPY package*.json ./
RUN npm install
# Install ALL dependencies including build tools, TypeScript compiler, etc.
COPY . .
RUN npm run build
# Compile TypeScript → JavaScript output in /build/dist/
# This stage will be discarded — it never appears in the registry.

# Stage 2 — production: minimal runtime, no build tooling
FROM node:18-alpine AS production
WORKDIR /app
COPY package*.json ./
RUN npm install --omit=dev
# Re-install production dependencies only — no compiler, no type checker.
COPY --from=builder /build/dist ./dist
# Copy ONLY the compiled output from the builder stage.
# Everything else in the builder — source .ts files, devDependencies,
# TypeScript compiler, test fixtures — is discarded here.
RUN addgroup -S appgroup && adduser -S appuser -G appgroup && \
    chown -R appuser:appgroup /app
USER appuser
EXPOSE 3000
CMD ["node", "dist/server.js"]

docker images payment-api

REPOSITORY    TAG          SIZE
payment-api   single       1.41GB   ← single-stage: TypeScript + all deps + source
payment-api   multi        94MB     ← multi-stage: JS output + prod deps only

# What's in the single-stage image that shouldn't be:
# - typescript compiler:          87MB
# - @types/* packages:            120MB
# - ts-node, ts-jest, eslint:     210MB
# - Source .ts files:             8MB
# - Test fixtures and mocks:      45MB
# Total unnecessary content:      470MB

# Pull time difference:
# 1.41 GB image: ~42 seconds on a 300 Mbps connection
#   94 MB image: ~3 seconds — 14× faster deploy startup
# Vulnerability scan: 1.41 GB has 3× more packages to scan — 3× more CVE surface.

What just happened?

The multi-stage build produced a 94 MB image from a build process that used a 1.41 GB builder. The TypeScript compiler, type definitions, test tooling, and source files were all discarded in the transition from builder to production stage. The final image contains only what the process needs at runtime — the compiled JavaScript and production node_modules. An attacker who compromises the production container finds no compiler to run arbitrary code with, no test framework to exploit, and no source code to read.

Optimization 3 — BuildKit Parallelism

BuildKit is Docker's modern build engine — enabled by default in Docker 23+ and available in earlier versions via environment variable. Its key performance feature: it analyses the Dockerfile's stage dependency graph and executes independent stages in parallel. A multi-stage build where the test stage and the production stage are both derived from the same base runs both simultaneously — not sequentially.

# syntax=docker/dockerfile:1
# BuildKit reads this comment and enables advanced features.

FROM node:18-alpine AS base
WORKDIR /app
COPY package*.json ./
RUN npm install

# ── These two stages are independent — BuildKit runs them in PARALLEL ──

FROM base AS test
# Test stage — runs linting and unit tests during the build.
# If tests fail, the build fails before the production image is created.
COPY . .
RUN npm run lint && npm test
# BuildKit starts this stage at the same time as the production stage below.

FROM base AS production
# Production stage — builds the final image.
# Runs IN PARALLEL with the test stage — no waiting.
RUN npm install --omit=dev
COPY . .
RUN addgroup -S appgroup && adduser -S appuser -G appgroup && \
    chown -R appuser:appgroup /app
USER appuser
EXPOSE 3000
CMD ["node", "server.js"]

# Enable BuildKit on older Docker versions:
DOCKER_BUILDKIT=1 docker build --target production -t payment-api:v1.0.0 .

# Docker 23+ — BuildKit is on by default, no env var needed:
docker build --target production -t payment-api:v1.0.0 .

# BuildKit cache mount — speeds up package managers inside RUN:
# Caches the npm/pip cache directory between builds on the same host.
RUN --mount=type=cache,target=/root/.npm \
    npm install --omit=dev
# /root/.npm is the npm cache directory.
# With a cache mount, npm only downloads packages not already in the cache.
# On subsequent builds, even fresh package installs are dramatically faster.

# Without BuildKit — sequential stage execution:
docker build --target production .
[+] Building 94.2s
 => [base] npm install              18.4s
 => [test] npm run lint && test     32.1s   ← waits for base, then runs
 => [production] npm install        18.3s   ← waits for test to finish
 => [production] COPY . .            0.4s
Total: 94.2 seconds

# With BuildKit — parallel stage execution:
DOCKER_BUILDKIT=1 docker build --target production .
[+] Building 52.7s
 => [base] npm install              18.4s
 => [test] npm run lint && test     32.1s   ─┐ run simultaneously
 => [production] npm install        18.3s   ─┘ after base completes
 => [production] COPY . .            0.4s
Total: 52.7s — test and production stages overlapped.
44% faster. Same guarantees. Tests still gate the production build.

Optimization 4 — Registry Cache in CI

On a developer's laptop, the layer cache persists between builds — packages installed yesterday are cached today. On a CI server, every pipeline run typically starts from scratch — no cache, no history, full rebuild. The fix is registry-based cache: BuildKit pushes the cache to the container registry alongside the image, and pulls it back on the next run. The CI runner gets the same cache benefits a local machine has.

# CI pipeline script — GitHub Actions, GitLab CI, or any runner:
GIT_SHA=$(git rev-parse --short HEAD)
IMAGE="acmecorp/payment-api"
CACHE_IMAGE="${IMAGE}:cache"

# Build with registry cache — import previous cache, export new cache:
docker buildx build \
  --cache-from type=registry,ref=${CACHE_IMAGE} \
  --cache-to   type=registry,ref=${CACHE_IMAGE},mode=max \
  --target production \
  --tag ${IMAGE}:${GIT_SHA} \
  --push \
  .
# --cache-from  → pull cached layers from the registry before building
#                 CI runner reuses the same layers as the previous run
# --cache-to    → push updated cache back to the registry after building
#                 mode=max caches all layers, not just the final stage
# --push        → push the final image to the registry in the same command

# First CI run (cold cache): 94 seconds — downloads everything
# Second CI run (warm cache, code change only): 8 seconds
# Second CI run (warm cache, no change): 3 seconds

# First CI run — cold cache:
[+] Building 94.1s
 => importing cache from acmecorp/payment-api:cache   0.3s (no cache yet)
 => [base] FROM node:18-alpine                        3.2s
 => [base] COPY package*.json                         0.1s
 => [base] RUN npm install                           48.3s  ← full download
 => [production] COPY . .                             0.4s
 => pushing cache to registry                         8.1s

# Second CI run — warm cache, one file changed:
[+] Building 7.8s
 => importing cache from acmecorp/payment-api:cache   1.2s  ← cache pulled
 => CACHED [base] FROM node:18-alpine                 0ms
 => CACHED [base] COPY package*.json                  0ms
 => CACHED [base] RUN npm install                     0ms   ← from registry cache
 => [production] COPY . .                             0.4s  ← only rebuild
 => pushing updated image                             4.1s
# 94 seconds → 7.8 seconds on a stateless CI runner. 12× faster.

What just happened?

The CI runner pulled the layer cache from the registry at the start of the build. The npm install layer was already in the cache — identical package.json, identical result — so it was served from the registry cache in 0ms instead of running for 48 seconds. The only layer that actually rebuilt was the source code copy. Total: 7.8 seconds on a stateless runner that has never seen this project before. The cache travels with the image in the registry — any runner anywhere in the world that pulls the cache gets the same benefit.

Optimization 5 — Image Size Reduction

Image size directly affects pull time during deployments, vulnerability scan duration, and storage costs at scale. The three levers: choose a minimal base image, combine related RUN instructions into one, and clean up package manager caches in the same layer that created them.

# Base image choices — same Node.js runtime, very different sizes:
FROM node:18              # 991MB — Debian full, all tools
FROM node:18-slim         # 245MB — Debian slim, fewer packages
FROM node:18-alpine       #  91MB — Alpine Linux, minimal
FROM node:18-alpine AS b
FROM gcr.io/distroless/nodejs18-debian11  # 164MB — no shell, no package manager

# For Node.js: node:18-alpine is the standard production choice.
# For Python: python:3.11-slim (not alpine — C extension builds need glibc).
# For Go:     distroless/static (binary needs nothing — not even a runtime).

# Combining RUN instructions — each RUN creates a layer:
# SLOW (three layers, package cache stays in image):
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

# FAST (one layer, cache deleted in the same instruction that created it):
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*
# --no-install-recommends  → skip suggested packages — often 50-200MB of extras
# rm -rf /var/lib/apt/lists → delete the package index downloaded by apt-get update
#                             if deleted in a SEPARATE RUN, the index layer persists
#                             must be in the SAME RUN to actually reduce image size

# Image size comparison — same application, different base images:
docker images payment-api

REPOSITORY    TAG          SIZE
payment-api   node18       1012MB   ← node:18 (Debian full)
payment-api   node18slim    267MB   ← node:18-slim
payment-api   node18alpine   94MB   ← node:18-alpine  ← production standard
payment-api   distroless    164MB   ← distroless/nodejs18 (no shell)

# Impact of combining RUN instructions (Python example):
# Three separate RUN apt-get commands:   312MB
# One combined RUN with cache cleanup:   198MB
# Savings: 114MB — 37% smaller — from one Dockerfile change.

# dive — tool for inspecting what's in each layer:
docker run --rm -it \
  -v /var/run/docker.sock:/var/run/docker.sock \
  wagoodman/dive payment-api:node18alpine
# Shows per-layer breakdown — identifies which layer is responsible
# for unexpected size. Essential for diagnosing image bloat.

The Complete Optimization Scenario

The scenario: You're inheriting a CI pipeline that builds a Node.js TypeScript service in eleven minutes per commit. You apply all five optimizations in sequence. Here's the before and after for each step — and the total time saved across a team of fifteen developers over one week.

# Starting point — measure the baseline:
time docker build --no-cache -t payment-api:before .
# Result: 11m 18s  |  Image size: 1.41 GB

# Step 1 — add .dockerignore:
echo "node_modules\n.git\n.env*\ncoverage\ndist" > .dockerignore
time docker build --no-cache -t payment-api:step1 .
# Result: 10m 02s  |  Build context: 847MB → 2.4MB  (-76s from context transfer)

# Step 2 — reorder COPY instructions (package.json before source):
# Edit Dockerfile: move COPY package*.json ./ before COPY . .
# (Subsequent builds with code-only change — cache now hits on npm install)
time docker build -t payment-api:step2 .
# Result: 1m 54s   |  npm install served from cache  (-8m 08s)

# Step 3 — multi-stage build (discard TypeScript compiler):
# Separate builder and production stages in Dockerfile
time docker build --target production -t payment-api:step3 .
# Result: 1m 44s   |  Image size: 1.41GB → 94MB  (-10s from smaller push)

# Step 4 — enable BuildKit + parallel stages:
DOCKER_BUILDKIT=1 time docker build --target production -t payment-api:step4 .
# Result: 1m 08s   |  Test and production stages run in parallel  (-36s)

# Step 5 — registry cache in CI:
# Add --cache-from and --cache-to to the CI build command
# Result: 0m 52s   |  Warm cache — only changed layers rebuild  (-16s)

# ─────────────────────────────────────────────────────────
# Before: 11m 18s  |  After: 0m 52s  |  14× faster
# Team of 15, 10 commits/day, 5 days/week:
# Before: 15 × 10 × 5 × 678s = 141.25 hours of CI time per week
# After:  15 × 10 × 5 × 52s  =  10.83 hours of CI time per week
# Saved:  130 hours of engineering wait time per week.

What just happened?

Five changes to the Dockerfile and CI script reduced build time from eleven minutes to fifty-two seconds — a 14× improvement. The largest single gain was layer cache ordering: moving two lines in the Dockerfile saved over eight minutes per build. Every other optimization added on top. The image shrank from 1.41 GB to 94 MB — fifteen times smaller, with a correspondingly smaller attack surface, faster pulls, and lower registry storage costs. None of these changes affected what the running application does. They only changed how it gets built and delivered.

Optimization impact — ranked by effort vs gain

# Optimization Effort Time saved

1 Layer cache ordering — copy manifest before source 2 min 8m 08s per build

2 .dockerignore — exclude node_modules and .git 5 min 76s per build

3 Multi-stage build — discard toolchain from final image 30 min 10s + 14× smaller image

4 BuildKit + parallel stages 10 min 36s per build

5 Registry cache in CI 45 min 16s per build on CI

Teacher's Note

Do optimization 1 first — always. It takes two minutes, requires no new tools, and saves more time than every other optimization combined. Most Docker builds in production today are still doing optimization 1 wrong. Add a .dockerignore second — another five minutes, dramatic context size reduction. Everything else — BuildKit, registry cache, multi-stage builds — builds on top of these two foundations. Start with what costs nothing and gives the most.

Practice Questions

1. In a Node.js Dockerfile, what file must be copied before the application source code to ensure the dependency installation layer is cached independently of code changes?

2. To pull a previously saved layer cache from the container registry at the start of a CI build — so the stateless runner can reuse cached layers — which docker buildx build flag is used?

3. When running apt-get install inside a Dockerfile, which flag prevents apt from installing suggested and recommended packages — often reducing image size by 50–200 MB?

Quiz

Up Next · Lesson 41

Docker Security Hardening

Builds are fast and images are lean — now the deeper security question: what happens after a vulnerability is exploited? Security hardening is the set of controls that limit what an attacker can do once they're inside a container — AppArmor profiles, seccomp filters, rootless Docker, and the runtime flags that make compromise survivable.

← Previous Course Index Next →

Docker Course

Docker Performance Optimization

Slow vs Fast Builds

The Archaeology Layer Analogy

Optimization 1 — Layer Cache Ordering

Optimization 2 — Multi-Stage Builds for Size

Optimization 3 — BuildKit Parallelism

Optimization 4 — Registry Cache in CI

Optimization 5 — Image Size Reduction

The Complete Optimization Scenario

Practice Questions

Quiz