Docker Lesson 37 – Docterizing Backend Applications | Dataplexa
Section IV · Lesson 37

Dockerizing Backend Applications

A senior engineer spent an afternoon containerizing a Python service. The image built fine. It ran fine locally. In production it crashed immediately — ModuleNotFoundError on a package that was clearly in requirements.txt. The cause: the Dockerfile ran pip install as root, then switched to a non-root user whose home directory had no access to the installed packages. One line wrong. Four hours of debugging. The fix was eleven characters.

Dockerizing a backend application correctly is not just writing a Dockerfile that builds. It's understanding how each language runtime behaves inside a container — where it installs packages, how it finds them, what it expects from the filesystem, and how to make all of that work when the process is running as a non-root user with a read-only filesystem. This lesson covers Node.js, Python, and Go — three runtimes, three sets of gotchas, one consistent set of principles.

The Wrong Way vs The Right Way

Common first attempt

  • Single-stage image — dev and prod tooling mixed together
  • All source files copied before installing dependencies
  • Every change busts the dependency cache — slow rebuilds
  • Running as root — security risk from Lesson 32
  • No .dockerignorenode_modules and .git copied into the image
  • Entire application in one fat layer — hard to debug, slow to push
  • No health check — orchestrators can't tell if the app is ready

Production-ready image

  • Multi-stage build — dev stage for tooling, prod stage for runtime
  • Dependency files copied first — cache busts only on dep changes
  • Source code copied after deps — fast rebuilds on code changes
  • Non-root user with correct file ownership
  • .dockerignore excludes everything the image doesn't need
  • Minimal base image — Alpine or distroless where possible
  • Health check defined — orchestrators know when the app is ready

The Layer Cache Analogy

The Mise en Place Analogy

Professional chefs call it mise en place — everything in its place before cooking begins. Ingredients are measured and arranged in the order they'll be used. A Dockerfile follows the same discipline: instructions that change rarely go at the top (base image, system packages, dependency installation), instructions that change often go at the bottom (source code copy). When you change a line, Docker rebuilds from that line downward — everything above it is served from cache instantly. Copy your package.json before your source code, and a code change never re-runs npm install. That's the difference between a 30-second rebuild and a 3-second one.

The .dockerignore File

Before writing a single Dockerfile instruction, create a .dockerignore file. Every file sent to the Docker Daemon during a build is called the build context. Without a .dockerignore, that context includes node_modules (potentially gigabytes), .git (full commit history), test fixtures, local .env files, and anything else sitting in the project directory. This slows every build and can accidentally bake sensitive files into the image.

# .dockerignore — applies to ALL languages, adjust as needed
.git
.gitignore
.dockerignore

# Dependencies — rebuilt inside the image, never copied from host
node_modules
__pycache__
*.pyc
*.pyo
.venv
venv

# Test and development files
*.test.js
*.spec.js
tests/
__tests__/

# Local environment and secrets — never in the image
.env
.env.*
*.local

# Build output and caches
dist/
build/
.cache/
coverage/
.pytest_cache/

# Editor and OS files
.vscode/
.idea/
*.swp
.DS_Store
Thumbs.db

# Documentation
README.md
docs/
*.md
# Build context size — without vs with .dockerignore:

# Without .dockerignore:
docker build .
Sending build context to Docker daemon  847.3MB
# 847 MB — node_modules, .git history, test fixtures all included

# With .dockerignore:
docker build .
Sending build context to Docker daemon  2.41MB
# 2.4 MB — only source files and package manifests
# Build starts 350× faster. Smaller context = less to transfer, less to scan.

Dockerizing a Node.js Application

Node.js is the most common backend runtime in Docker. The key decisions: use Alpine as the base image for a small footprint, copy package.json before source code to preserve the dependency cache, install only production dependencies in the final stage, and ensure the non-root user owns the application directory before the USER instruction switches away from root.

# syntax=docker/dockerfile:1
FROM node:18-alpine AS base
WORKDIR /app
COPY package*.json ./
# Copy only the dependency manifest first.
# As long as package.json doesn't change, this layer is cached —
# the npm install step below will never re-run on a code-only change.

# ────────────────────────────────────────────────────────
FROM base AS development
RUN npm install
# Install all dependencies including devDependencies.
# Source code is NOT copied — it will be mounted as a volume.
EXPOSE 3000
CMD ["npm", "run", "dev"]

# ────────────────────────────────────────────────────────
FROM base AS production
RUN npm install --omit=dev
# Production dependencies only — no test runners, type checkers, or build tools.
# --omit=dev is the modern equivalent of --production.

COPY . .
# Source code copied AFTER npm install so code changes don't bust the dep cache.

RUN addgroup -S appgroup && \
    adduser  -S appuser -G appgroup && \
    chown -R appuser:appgroup /app
# Create non-root user and transfer ownership of /app BEFORE switching.
# If you chown after USER, the command runs as appuser and may lack permission
# to chown files owned by root — a common and confusing failure mode.

USER appuser

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
  CMD wget -qO- http://localhost:3000/health || exit 1
# Orchestrators (Compose, Swarm, Kubernetes) use this to know when the app
# is ready to receive traffic. Without it, they assume ready on start.

EXPOSE 3000
CMD ["node", "server.js"]
# First build — everything runs:
docker build --target production -t payment-api:v1.0.0 .
[+] Building 34.2s (11/11) FINISHED
 => [base 1/2] FROM node:18-alpine
 => [base 2/2] COPY package*.json ./            ← 0.1s
 => [production 1/3] RUN npm install --omit=dev ← 28.4s (downloads packages)
 => [production 2/3] COPY . .                   ← 0.2s
 => [production 3/3] RUN addgroup -S ...        ← 0.3s

# Second build after a code change — deps cached:
docker build --target production -t payment-api:v1.0.1 .
[+] Building 1.8s (11/11) FINISHED
 => CACHED [base 2/2] COPY package*.json ./
 => CACHED [production 1/3] RUN npm install     ← 0ms — served from cache
 => [production 2/3] COPY . .                   ← 0.2s — only this re-ran
 => [production 3/3] RUN addgroup -S ...        ← 0.3s
# 34 seconds → 1.8 seconds. Same result. Layer cache at work.

docker images payment-api
REPOSITORY    TAG      SIZE
payment-api   v1.0.0   91.4MB   ← Alpine + prod deps only

What just happened?

The first build took 34 seconds — most of that was downloading npm packages. The second build, triggered by a code change, took 1.8 seconds because the npm install layer was served from cache. The key: package.json was copied before source code. If they'd been in the same COPY . . instruction, every code change would have re-run npm install. The production image is 91 MB — Alpine base, production node_modules, application code. No dev tooling. No test files. Nothing extra.

Dockerizing a Python Application

Python adds one critical complexity: virtual environments. By default, pip install installs packages to a location that may not be on the PATH of a non-root user, or may conflict with system Python packages in the base image. The correct pattern: create a virtual environment explicitly, install packages into it, and set PATH to use it — so the non-root user finds packages reliably regardless of what Python version is in the base image.

# syntax=docker/dockerfile:1
FROM python:3.11-slim AS base
# python:3.11-slim — Debian-based, smaller than the full image, has pip.
# Avoid python:3.11-alpine for Python — many packages require C extensions
# that need build tools not present in Alpine, causing pip install to fail.

WORKDIR /app

# Create a virtual environment in a predictable location.
# This isolates pip installs from the system Python completely.
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Add the venv bin to PATH so `python` and `pip` always resolve to the venv.
# This ENV persists through all subsequent RUN, CMD, and ENTRYPOINT instructions
# and is inherited by the running container — non-root user included.

# ────────────────────────────────────────────────────────
FROM base AS builder
# Separate stage for building — keeps build tools out of the final image.
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
# --no-cache-dir prevents pip from writing a local cache — keeps the layer lean.
# Packages are installed into /opt/venv because PATH points there.

# ────────────────────────────────────────────────────────
FROM base AS production
# Copy only the installed venv from the builder stage — not the build tools.
COPY --from=builder /opt/venv /opt/venv

COPY . .

RUN addgroup --system appgroup && \
    adduser  --system --ingroup appgroup appuser && \
    chown -R appuser:appgroup /app /opt/venv
# Transfer ownership of both the app directory AND the venv.
# Without chowning /opt/venv, appuser can't read the installed packages.
# This was the bug described in the lesson introduction.

USER appuser

HEALTHCHECK --interval=30s --timeout=5s --start-period=15s --retries=3 \
  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" \
  || exit 1

EXPOSE 8000
CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
# 0.0.0.0 is required — the default 127.0.0.1 only accepts connections from
# within the container. Forgetting this is the most common Python/uvicorn gotcha.
docker build --target production -t payment-service:v1.0.0 .
[+] Building 41.3s (13/13) FINISHED
 => [base] FROM python:3.11-slim
 => [base] RUN python -m venv /opt/venv
 => [builder] COPY requirements.txt ./
 => [builder] RUN pip install --no-cache-dir -r requirements.txt   ← 36s
 => [production] COPY --from=builder /opt/venv /opt/venv           ← venv only
 => [production] COPY . .
 => [production] RUN addgroup --system ...

docker images payment-service
REPOSITORY        TAG      SIZE
payment-service   v1.0.0   198MB
# python:3.11-slim base + venv with FastAPI + uvicorn. No build tools in image.

# Verify the non-root user can find packages:
docker run --rm payment-service:v1.0.0 python -c "import fastapi; print('ok')"
ok
# /opt/venv/bin is on PATH for appuser — packages found. No ModuleNotFoundError.

What just happened?

The virtual environment was created in /opt/venv and added to PATH via an ENV instruction — making it available to every subsequent instruction and to the running process regardless of which user is active. The builder stage installed all packages; the production stage copied only the completed venv, leaving build tools behind. Ownership of /opt/venv was transferred to appuser before the USER switch. The non-root user can read installed packages cleanly — the bug from the opening story is eliminated by a single chown.

Dockerizing a Go Application

Go is the most container-friendly language. It compiles to a single static binary with no runtime dependencies — meaning the final production image can be scratch or distroless: an image with no shell, no package manager, no operating system utilities — just the binary and nothing else. The result is an image measured in single-digit megabytes, with the smallest possible attack surface of any language runtime.

# syntax=docker/dockerfile:1
FROM golang:1.22-alpine AS builder
# golang:1.22-alpine — full Go toolchain for compilation.
# This stage will NOT appear in the final image — it's discarded after build.

WORKDIR /build

# Copy dependency manifests first — cache the module download step.
COPY go.mod go.sum ./
RUN go mod download
# Downloads all dependencies into the module cache.
# Re-runs only when go.mod or go.sum changes — not on code changes.

COPY . .

# Compile the binary — CGO disabled for a fully static binary.
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
    go build \
    -ldflags="-w -s" \
    -o /app/payment-api \
    ./cmd/server
# CGO_ENABLED=0   → no C bindings — fully static binary, no libc dependency
# GOOS=linux      → target Linux regardless of build machine OS
# GOARCH=amd64    → target x86_64 — change for ARM deployments
# -ldflags="-w -s" → strip debug info and symbol table → smaller binary
# -o /app/payment-api → output path for the compiled binary

# ────────────────────────────────────────────────────────
FROM gcr.io/distroless/static:nonroot AS production
# distroless/static — contains ONLY:
# • CA certificates (for HTTPS calls)
# • /etc/passwd and /etc/group (for non-root user)
# • timezone data
# Nothing else. No shell. No curl. No apt. No sh.
# An attacker who gets code execution has almost no tools to work with.

COPY --from=builder /app/payment-api /payment-api
# Copy only the compiled binary from the builder stage.
# The entire Go toolchain, source code, and module cache are discarded.

# distroless:nonroot already defaults to uid 65532 (nonroot user).
# No USER instruction needed — it's baked into the base image.

HEALTHCHECK --interval=30s --timeout=5s --start-period=5s --retries=3 \
  CMD ["/payment-api", "--health-check"]
# No shell available — HEALTHCHECK must use exec form, not shell form.
# The binary itself handles the --health-check flag.

EXPOSE 8080
ENTRYPOINT ["/payment-api"]
docker build --target production -t payment-api-go:v1.0.0 .
[+] Building 28.4s (10/10) FINISHED
 => [builder] FROM golang:1.22-alpine
 => [builder] COPY go.mod go.sum ./
 => [builder] RUN go mod download          ← 18.2s first run, cached after
 => [builder] COPY . .
 => [builder] RUN CGO_ENABLED=0 ... go build ← 7.1s compilation
 => [production] FROM gcr.io/distroless/static:nonroot
 => [production] COPY --from=builder /app/payment-api /payment-api

docker images payment-api-go
REPOSITORY       TAG      SIZE
payment-api-go   v1.0.0   8.3MB
# 8.3 MB total. The Go binary is 7.8 MB. The distroless base adds 0.5 MB.
# For comparison: the same service in Node.js (Alpine) = 91 MB.
# Go distroless = 11× smaller. Pulls in under 2 seconds on any connection.

# Verify there is no shell:
docker run --rm -it payment-api-go:v1.0.0 sh
docker: Error response from daemon: no such file or directory: "sh"
# No shell. No bash. No tools. Just the binary. Attack surface: near zero.

What just happened?

The entire Go toolchain — compiler, module cache, source code — was used in the builder stage and then completely discarded. The production image received only the compiled binary and a distroless base that weighs 0.5 MB. The result is an 8.3 MB image with no shell, no package manager, and no OS utilities. An attacker who achieves code execution inside this container has almost nothing to work with — there are no tools to download more tools, no shell to run commands in, and no filesystem to write to. This is what maximum container hardening looks like in practice.

Language Comparison — Image Sizes and Trade-offs

Same payment service — three runtimes

Language Image size Base image Key gotcha
Node.js 91 MB node:18-alpine Copy package.json before source — else every code change re-runs npm install
Python 198 MB python:3.11-slim Use a venv at /opt/venv and chown it to the non-root user — else ModuleNotFoundError
Go 8.3 MB distroless/static Set CGO_ENABLED=0 for a fully static binary — else the binary links to libc not present in distroless

A Complete Production Scenario

The scenario: You're containerizing a Node.js payment API for the first time on a team that has been running it bare-metal. You need the image to build fast in CI, run as a non-root user, expose a health check, and be as small as possible. Here's the complete file set from zero to a production-ready image.

# Step 1 — create .dockerignore before writing a single Dockerfile line
cat > .dockerignore << 'EOF'
.git
node_modules
.env
.env.*
dist/
coverage/
*.test.js
README.md
EOF

# Step 2 — build the production image
docker build --target production -t acmecorp/payment-api:a3f2c8d .

# Step 3 — verify the image before pushing
# Check the size:
docker images acmecorp/payment-api:a3f2c8d
# Check who the process runs as:
docker run --rm acmecorp/payment-api:a3f2c8d whoami
appuser
# Check the health check is configured:
docker inspect acmecorp/payment-api:a3f2c8d \
  --format '{{json .Config.Healthcheck}}' | python3 -m json.tool

# Step 4 — run it and confirm health:
docker run -d --name payment-api -p 3000:3000 acmecorp/payment-api:a3f2c8d
docker ps
CONTAINER ID   NAME          STATUS
a1b2c3d4e5f6   payment-api   Up 12 seconds (health: starting)
# Wait 30 seconds (start-period):
docker ps
CONTAINER ID   NAME          STATUS
a1b2c3d4e5f6   payment-api   Up 45 seconds (healthy)

# Step 5 — push to registry
docker push acmecorp/payment-api:a3f2c8d
docker images acmecorp/payment-api:a3f2c8d
REPOSITORY               TAG        SIZE
acmecorp/payment-api     a3f2c8d    91.4MB

docker run --rm acmecorp/payment-api:a3f2c8d whoami
appuser

docker inspect acmecorp/payment-api:a3f2c8d \
  --format '{{json .Config.Healthcheck}}' | python3 -m json.tool
{
  "Test": ["CMD-SHELL", "wget -qO- http://localhost:3000/health || exit 1"],
  "Interval": 30000000000,
  "Timeout": 5000000000,
  "StartPeriod": 10000000000,
  "Retries": 3
}

# All checks pass:
# ✓ 91 MB — Alpine + prod deps only
# ✓ Running as appuser — not root
# ✓ Health check configured and passing
# ✓ Ready to push and deploy

Teacher's Note

Write your .dockerignore before your Dockerfile — every time, without exception. It's a ten-second step that prevents a category of build problems that are frustrating to diagnose. Then structure your Dockerfile from least-changing to most-changing: base image, system packages, dependency install, source copy. Get those two habits right and 80% of Dockerization problems disappear before they start.

Practice Questions

1. In a Node.js Dockerfile, which file should be copied into the image before running npm install — so that a code-only change does not bust the dependency layer cache?



2. When compiling a Go binary for a distroless container image, which environment variable must be set to produce a fully static binary with no libc dependency?



3. A Python uvicorn server starts successfully inside the container but is unreachable from outside it. The most likely cause is that uvicorn is bound to 127.0.0.1 by default. What host address must it bind to instead in order to accept connections from outside the container?



Quiz

1. A Python container starts correctly as root but crashes with ModuleNotFoundError after the Dockerfile switches to a non-root user. What is the cause and fix?


2. A developer notices their docker build command takes 45 seconds before a single Dockerfile instruction runs. The output shows Sending build context to Docker daemon 680MB. What is happening?


3. A team is choosing between Node.js and Go for a new microservice that will run in many containers across a fleet. From a container image size and security perspective, what is the key difference?


Up Next · Lesson 38

Dockerizing Databases

Backend applications containerized — now the harder problem: databases. Stateful containers need volumes, initialisation scripts, health checks, and backup strategies that stateless services don't. Postgres, MySQL, and Redis each have their own patterns for getting this right.