Docker Lesson 38 – Dockerizing Databases | Dataplexa

Section IV · Lesson 38

Dockerizing Databases

A startup ran Postgres in a Docker container for six months. No volumes. The data lived inside the container's writable layer. One Tuesday morning an engineer ran docker-compose down to restart the stack after a config change. The database came back up clean — because it was a fresh container with no data. Six months of user records, transaction history, and account balances: gone. The command completed in four seconds. The recovery took three weeks and was never complete.

Databases are fundamentally different from application containers. Application containers are stateless — you can destroy and recreate them freely. Databases are stateful — data must survive container restarts, replacements, and host reboots. This lesson covers how to containerize Postgres, MySQL, and Redis correctly: persistent volumes, initialisation scripts, health checks with readiness detection, and the backup patterns that prevent the story above from happening to you.

Stateless vs Stateful Containers

Stateless — application containers

No data to preserve — destroy and recreate freely
Any replica can handle any request
docker rm and docker run are safe operations
Scaling is trivial — add more containers
Health check: is the process responding?
Failure: restart it. Data: unaffected.

Stateful — database containers

Data must survive restarts, recreation, host reboots
Replicas require replication setup — not just adding containers
docker rm without a volume is a data-loss event
Scaling requires careful coordination
Health check: is the database ready to accept queries?
Failure: recover carefully. Data: must be preserved.

The Safety Deposit Box Analogy

A database container without a volume is like keeping your valuables in a hotel room safe — not the bank's safety deposit box. The hotel room is convenient and feels secure, but when the hotel is renovated (the container is replaced), everything in that safe is gone. A named volume is the safety deposit box: the bank vault persists regardless of what happens to the room you accessed it from. The container is the room. The data is the valuables. The volume is the vault. Always put the valuables in the vault.

Dockerizing PostgreSQL

Postgres is the most common relational database in containerised stacks. The official image handles initialisation automatically — any .sql or .sh files mounted into /docker-entrypoint-initdb.d/ are executed on first startup. The critical requirements: a named volume for /var/lib/postgresql/data, a health check that tests actual query readiness (not just process presence), and secrets injected at runtime.

version: "3.8"

services:
  db:
    image: postgres:15-alpine
    # Use a specific version tag — never postgres:latest in production.
    # Alpine variant: smaller footprint, same functionality.
    environment:
      - POSTGRES_DB=payment_db
      - POSTGRES_USER=payment_user
      - POSTGRES_PASSWORD=${DB_PASSWORD}
      # Password from .env — never hardcoded. Lesson 33.
    volumes:
      - pgdata:/var/lib/postgresql/data
      # Named volume — data persists across container restarts and replacements.
      # This is the single most important line for any database container.
      - ./init:/docker-entrypoint-initdb.d/
      # Mount initialisation scripts — executed once on first startup only.
      # Useful for schema creation, seed data, extension installation.
    ports:
      - "5432:5432"
      # Expose only for development. Remove this line in production —
      # the database should only be reachable by containers on the same network.
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      # pg_isready checks whether Postgres is ready to accept connections.
      # This is database-specific — it fails if the process is up but not yet
      # accepting queries (common during startup and crash recovery).
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
      # start_period: give Postgres 30s to initialise before health checks begin.
      # Without this, the first few checks fail and the container is marked
      # unhealthy before it's had a chance to start.
    restart: unless-stopped
    logging:
      driver: json-file
      options:
        max-size: "20m"
        max-file: "3"
    deploy:
      resources:
        limits:
          memory: 1G
        reservations:
          memory: 512M

volumes:
  pgdata:
    # Named volume managed by Docker. Persists until explicitly deleted with
    # docker volume rm pgdata — not affected by docker-compose down.

# init/01-schema.sql — executed once on first Postgres startup
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
-- Install UUID generation extension before creating tables.

CREATE TABLE IF NOT EXISTS payments (
  id          UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  user_id     UUID NOT NULL,
  amount      NUMERIC(12, 2) NOT NULL,
  currency    CHAR(3) NOT NULL DEFAULT 'USD',
  status      VARCHAR(20) NOT NULL DEFAULT 'pending',
  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX IF NOT EXISTS idx_payments_user_id ON payments (user_id);
CREATE INDEX IF NOT EXISTS idx_payments_status  ON payments (status);

-- init/02-seed.sql — executed second (files run in alphabetical order)
INSERT INTO payments (user_id, amount, currency, status)
VALUES
  ('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 99.99,  'USD', 'completed'),
  ('b0eebc99-9c0b-4ef8-bb6d-6bb9bd380a22', 149.50, 'GBP', 'pending')
ON CONFLICT DO NOTHING;
-- ON CONFLICT DO NOTHING prevents errors if seed runs on an already-populated DB.

docker compose up -d

[+] Running 2/2
 ✔ Volume "pgdata" Created
 ✔ Container postgres-db   Started

# Watch health status during startup:
docker ps
CONTAINER ID   NAME          STATUS
a1b2c3d4e5f6   postgres-db   Up 8s (health: starting)

# After 30s start_period + successful pg_isready:
docker ps
CONTAINER ID   NAME          STATUS
a1b2c3d4e5f6   postgres-db   Up 42s (healthy)

# Verify init scripts ran:
docker exec postgres-db psql -U payment_user -d payment_db \
  -c "SELECT COUNT(*) FROM payments;"
 count
-------
     2
(1 row)
# Schema created. Seed data inserted. Both init files ran in alphabetical order.

# Simulate a container replacement — data must survive:
docker compose down      # container removed — pgdata volume untouched
docker compose up -d     # new container — mounts the same pgdata volume
docker exec postgres-db psql -U payment_user -d payment_db \
  -c "SELECT COUNT(*) FROM payments;"
 count
-------
     2
# Data is still there. The container changed. The volume did not.

What just happened?

The named volume pgdata persisted through a full docker compose down and up cycle. The container was destroyed and recreated — the data was not. The initialisation scripts ran exactly once on first startup; on the second startup Postgres detected existing data in the volume and skipped them. The health check used pg_isready — which tests actual connection readiness, not just process presence — so dependent services don't attempt connections while Postgres is still replaying its write-ahead log.

Dockerizing MySQL

MySQL follows the same pattern as Postgres with one key difference: the initialisation directory is the same (/docker-entrypoint-initdb.d/), but the health check command uses mysqladmin ping instead of pg_isready. MySQL also separates the root password from the application user password — both must be set to avoid the container refusing to start.

version: "3.8"

services:
  db:
    image: mysql:8.0
    environment:
      - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
      # Root password — required. Used only for administrative operations.
      - MYSQL_DATABASE=payment_db
      - MYSQL_USER=payment_user
      - MYSQL_PASSWORD=${DB_PASSWORD}
      # Application user — this is what the app connects as, not root.
      # The official image creates this user automatically on first startup.
    volumes:
      - mysqldata:/var/lib/mysql
      # Named volume for MySQL data directory.
      - ./init:/docker-entrypoint-initdb.d/
    healthcheck:
      test: ["CMD", "mysqladmin", "ping",
             "-h", "localhost",
             "-u", "payment_user",
             "--password=${DB_PASSWORD}"]
      # mysqladmin ping — tests whether MySQL is accepting connections.
      # Tests with the application user, not root.
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 40s
      # MySQL takes longer to initialise than Postgres on first startup —
      # give it 40s before health checks begin.
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 1G

volumes:
  mysqldata:

Dockerizing Redis

Redis is an in-memory store — by default, it holds everything in RAM and writes nothing to disk. A container restart loses all data. For a cache this is acceptable. For a job queue, session store, or rate limiter it is not. The fix is Redis persistence — either RDB snapshots, append-only file (AOF), or both — combined with a named volume so the persistence files survive container replacement.

version: "3.8"

services:
  redis:
    image: redis:7-alpine
    command: >
      redis-server
      --requirepass ${REDIS_PASSWORD}
      --appendonly yes
      --appendfsync everysec
      --maxmemory 256mb
      --maxmemory-policy allkeys-lru
      --save 900 1
      --save 300 10
      --save 60 10000
    # --requirepass           → password authentication — from Lesson 33
    # --appendonly yes        → AOF persistence: every write logged to disk
    # --appendfsync everysec  → flush to disk once per second — balanced mode
    #                           "always" = safest, "no" = fastest, "everysec" = default
    # --maxmemory 256mb       → hard cap on memory usage
    # --maxmemory-policy allkeys-lru → when full, evict least-recently-used keys
    #                                  correct for a cache — wrong for a job queue
    # --save 900 1            → RDB snapshot if ≥1 key changed in 900 seconds
    # --save 300 10           → RDB snapshot if ≥10 keys changed in 300 seconds
    # --save 60 10000         → RDB snapshot if ≥10000 keys changed in 60 seconds
    volumes:
      - redisdata:/data
      # /data is where Redis writes both AOF and RDB files.
    healthcheck:
      test: ["CMD", "redis-cli",
             "-a", "${REDIS_PASSWORD}",
             "ping"]
      # redis-cli ping returns PONG when Redis is ready.
      # Must include -a for password-protected instances.
      interval: 10s
      timeout: 3s
      retries: 3
      start_period: 10s
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 300M
          # Slightly above maxmemory to allow Redis overhead without OOM kill.

volumes:
  redisdata:

# Write a key, restart the container, verify persistence:
docker exec redis redis-cli -a ${REDIS_PASSWORD} SET session:u_8821 "active" EX 3600
OK

docker exec redis redis-cli -a ${REDIS_PASSWORD} GET session:u_8821
"active"

# Restart the container — AOF file persists in the volume:
docker compose restart redis

docker exec redis redis-cli -a ${REDIS_PASSWORD} GET session:u_8821
"active"
# Session survived the restart. AOF replayed on startup — data restored from log.

# Check persistence files on the volume:
docker exec redis ls -lh /data
-rw-r--r-- 1 redis redis 127  appendonly.aof
-rw-r--r-- 1 redis redis  92  dump.rdb
# Both AOF and RDB files present — double persistence layer.

What just happened?

The session key survived a container restart because AOF persistence was enabled and the AOF file lives in a named volume. On startup, Redis replayed the append-only log and restored all data written since the last restart. The maxmemory-policy allkeys-lru setting means Redis behaves correctly as a cache under memory pressure — it evicts the least recently used keys rather than refusing writes or crashing. The memory limit on the container is set slightly above maxmemory to give Redis room for its internal overhead without hitting the OOM killer.

Waiting for the Database — depends_on with Condition

The most common failure in a multi-container stack: the application starts, attempts a database connection, and crashes because the database container is running but not yet accepting queries. depends_on by itself only waits for the container to start — not for it to be healthy. The fix is condition: service_healthy, which holds the dependent service until the database's health check passes.

version: "3.8"

services:
  api:
    image: acmecorp/payment-api:${GIT_SHA}
    depends_on:
      db:
        condition: service_healthy
        # Do not start the API until db reports healthy via pg_isready.
        # Without this, the API starts in parallel with Postgres and
        # almost certainly attempts a connection before Postgres is ready.
      redis:
        condition: service_healthy
        # Same for Redis — wait for redis-cli ping to succeed.
    environment:
      - DB_HOST=db
      - REDIS_HOST=redis
    ports:
      - "3000:3000"

  db:
    image: postgres:15-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U payment_user -d payment_db"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      - POSTGRES_DB=payment_db
      - POSTGRES_USER=payment_user
      - POSTGRES_PASSWORD=${DB_PASSWORD}

  redis:
    image: redis:7-alpine
    command: redis-server --requirepass ${REDIS_PASSWORD}
    healthcheck:
      test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3
      start_period: 10s
    volumes:
      - redisdata:/data

volumes:
  pgdata:
  redisdata:

docker compose up

[+] Running 3/3
 ✔ Container redis         Started
 ✔ Container postgres-db   Started
 ✗ Container payment-api   Waiting   ← API is held — db not yet healthy

# 35 seconds later:
 ✔ Container postgres-db   Healthy   ← pg_isready passed
 ✔ Container redis         Healthy   ← redis-cli ping returned PONG
 ✔ Container payment-api   Started   ← API now starts — both deps healthy

payment-api | Server listening on :3000
payment-api | Database connected — pool size 10
payment-api | Redis connected — session store ready
# Clean startup. No connection errors. No retry logic needed in the application.

What just happened?

The API container was held for 35 seconds while Postgres initialised and passed its health check. The API then started with both dependencies confirmed ready — no connection errors on startup, no retry logic required in application code. This is the correct way to sequence a multi-container stack. Without condition: service_healthy, the API would start within milliseconds of Postgres, hit a connection refused error, and either crash or require complex retry logic inside the application itself.

Backup — The One Thing Volumes Don't Do

Named volumes prevent data loss from container replacement. They do not prevent data loss from disk failure, accidental deletion, or corruption. A volume on a single host is a single point of failure. Production databases need backups — separate from the host, automated, and regularly tested. The following patterns work for any database container.

# Postgres — pg_dump to a compressed backup file
docker exec postgres-db pg_dump \
  -U payment_user \
  -d payment_db \
  -F c \
  -f /tmp/backup.dump
# -F c → custom format (compressed, supports selective restore)
# -f   → output file path inside the container

# Copy the backup off the container and off the host:
docker cp postgres-db:/tmp/backup.dump ./backups/payment_db_$(date +%Y%m%d).dump
aws s3 cp ./backups/payment_db_$(date +%Y%m%d).dump \
  s3://acmecorp-backups/postgres/

# Restore from backup — to a new or existing container:
docker exec -i postgres-db pg_restore \
  -U payment_user \
  -d payment_db \
  --clean \
  --if-exists \
  < ./backups/payment_db_20240115.dump
# --clean     → drop existing objects before restoring
# --if-exists → skip errors if objects don't exist yet (safe for fresh DB)

# Redis — BGSAVE triggers a background RDB snapshot immediately:
docker exec redis redis-cli -a ${REDIS_PASSWORD} BGSAVE
Background saving started

# Copy the RDB file off the container:
docker cp redis:/data/dump.rdb ./backups/redis_$(date +%Y%m%d).rdb
aws s3 cp ./backups/redis_$(date +%Y%m%d).rdb \
  s3://acmecorp-backups/redis/

The Complete Database Setup Checklist

Before any database container goes to production

✓

Named volume declared — data directory mapped to a named volume, never the container's writable layer

✓

Specific image tag — postgres:15-alpine not postgres:latest — version pinned for reproducibility

✓

Health check configured — pg_isready, mysqladmin ping, or redis-cli ping — not a generic TCP check

✓

start_period set — allows the database time to initialise before health checks begin failing

✓

depends_on with condition: service_healthy — application waits for database readiness, not just container start

✓

Secrets at runtime — passwords from environment variables, never hardcoded in the Compose file

✓

Port not exposed in production — database port mapped only in development, closed in production Compose override

✓

Memory limit set — especially Redis: container limit slightly above maxmemory to absorb overhead

✓

Automated backups configured — pg_dump or BGSAVE on a schedule, copied off-host, tested for restore

A Volume Is Not a Backup

A named volume protects your data from container replacement. It does not protect it from docker volume rm, disk failure, accidental host deletion, or filesystem corruption. If your only copy of production data is in a Docker volume on a single host, you do not have a backup — you have a single point of failure with extra steps. Schedule automated pg_dump exports, copy them to object storage (S3, GCS), and test restore at least monthly. A backup you have never restored is a backup you cannot trust.

Teacher's Note

For development: use Docker containers for databases freely — they're fast to spin up, isolated, and easy to reset with fresh data. For production on a single server: containers work well with named volumes and automated backups. For production at scale with high availability requirements: consider managed database services (RDS, Cloud SQL, ElastiCache) rather than self-managed containers — the operational overhead of running a highly-available database cluster in containers is significant and rarely justified when managed alternatives exist.

Practice Questions

1. SQL and shell scripts mounted into which directory are automatically executed by the official Postgres and MySQL images on first container startup?

2. To prevent an application container from starting until its database dependency has passed its health check — not just started — which depends_on condition must be set?

3. To enable Redis persistence so that data survives container restarts — by logging every write operation to a file — which Redis configuration option must be enabled?

Quiz

Up Next · Lesson 39

Docker for Microservices

Databases containerized correctly — now the architecture question: how does Docker change when you go from one backend service to ten? Microservices introduce service discovery, inter-container networking, shared infrastructure, and deployment coordination that a single-service setup never has to deal with.

← Previous Course Index Next →

Docker Course

Dockerizing Databases

Stateless vs Stateful Containers

The Safety Deposit Box Analogy

Dockerizing PostgreSQL

Dockerizing MySQL

Dockerizing Redis

Waiting for the Database — depends_on with Condition

Backup — The One Thing Volumes Don't Do

The Complete Database Setup Checklist

Practice Questions

Quiz