Docker Course
Dockerizing Databases
A startup ran Postgres in a Docker container for six months. No volumes. The data lived inside the container's writable layer. One Tuesday morning an engineer ran docker-compose down to restart the stack after a config change. The database came back up clean — because it was a fresh container with no data. Six months of user records, transaction history, and account balances: gone. The command completed in four seconds. The recovery took three weeks and was never complete.
Databases are fundamentally different from application containers. Application containers are stateless — you can destroy and recreate them freely. Databases are stateful — data must survive container restarts, replacements, and host reboots. This lesson covers how to containerize Postgres, MySQL, and Redis correctly: persistent volumes, initialisation scripts, health checks with readiness detection, and the backup patterns that prevent the story above from happening to you.
Stateless vs Stateful Containers
Stateless — application containers
- No data to preserve — destroy and recreate freely
- Any replica can handle any request
docker rmanddocker runare safe operations- Scaling is trivial — add more containers
- Health check: is the process responding?
- Failure: restart it. Data: unaffected.
Stateful — database containers
- Data must survive restarts, recreation, host reboots
- Replicas require replication setup — not just adding containers
docker rmwithout a volume is a data-loss event- Scaling requires careful coordination
- Health check: is the database ready to accept queries?
- Failure: recover carefully. Data: must be preserved.
The Safety Deposit Box Analogy
The Safety Deposit Box Analogy
A database container without a volume is like keeping your valuables in a hotel room safe — not the bank's safety deposit box. The hotel room is convenient and feels secure, but when the hotel is renovated (the container is replaced), everything in that safe is gone. A named volume is the safety deposit box: the bank vault persists regardless of what happens to the room you accessed it from. The container is the room. The data is the valuables. The volume is the vault. Always put the valuables in the vault.
Dockerizing PostgreSQL
Postgres is the most common relational database in containerised stacks. The official image handles initialisation automatically — any .sql or .sh files mounted into /docker-entrypoint-initdb.d/ are executed on first startup. The critical requirements: a named volume for /var/lib/postgresql/data, a health check that tests actual query readiness (not just process presence), and secrets injected at runtime.
version: "3.8"
services:
db:
image: postgres:15-alpine
# Use a specific version tag — never postgres:latest in production.
# Alpine variant: smaller footprint, same functionality.
environment:
- POSTGRES_DB=payment_db
- POSTGRES_USER=payment_user
- POSTGRES_PASSWORD=${DB_PASSWORD}
# Password from .env — never hardcoded. Lesson 33.
volumes:
- pgdata:/var/lib/postgresql/data
# Named volume — data persists across container restarts and replacements.
# This is the single most important line for any database container.
- ./init:/docker-entrypoint-initdb.d/
# Mount initialisation scripts — executed once on first startup only.
# Useful for schema creation, seed data, extension installation.
ports:
- "5432:5432"
# Expose only for development. Remove this line in production —
# the database should only be reachable by containers on the same network.
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
# pg_isready checks whether Postgres is ready to accept connections.
# This is database-specific — it fails if the process is up but not yet
# accepting queries (common during startup and crash recovery).
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
# start_period: give Postgres 30s to initialise before health checks begin.
# Without this, the first few checks fail and the container is marked
# unhealthy before it's had a chance to start.
restart: unless-stopped
logging:
driver: json-file
options:
max-size: "20m"
max-file: "3"
deploy:
resources:
limits:
memory: 1G
reservations:
memory: 512M
volumes:
pgdata:
# Named volume managed by Docker. Persists until explicitly deleted with
# docker volume rm pgdata — not affected by docker-compose down.
# init/01-schema.sql — executed once on first Postgres startup
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
-- Install UUID generation extension before creating tables.
CREATE TABLE IF NOT EXISTS payments (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
user_id UUID NOT NULL,
amount NUMERIC(12, 2) NOT NULL,
currency CHAR(3) NOT NULL DEFAULT 'USD',
status VARCHAR(20) NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX IF NOT EXISTS idx_payments_user_id ON payments (user_id);
CREATE INDEX IF NOT EXISTS idx_payments_status ON payments (status);
-- init/02-seed.sql — executed second (files run in alphabetical order)
INSERT INTO payments (user_id, amount, currency, status)
VALUES
('a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11', 99.99, 'USD', 'completed'),
('b0eebc99-9c0b-4ef8-bb6d-6bb9bd380a22', 149.50, 'GBP', 'pending')
ON CONFLICT DO NOTHING;
-- ON CONFLICT DO NOTHING prevents errors if seed runs on an already-populated DB.
docker compose up -d
[+] Running 2/2
✔ Volume "pgdata" Created
✔ Container postgres-db Started
# Watch health status during startup:
docker ps
CONTAINER ID NAME STATUS
a1b2c3d4e5f6 postgres-db Up 8s (health: starting)
# After 30s start_period + successful pg_isready:
docker ps
CONTAINER ID NAME STATUS
a1b2c3d4e5f6 postgres-db Up 42s (healthy)
# Verify init scripts ran:
docker exec postgres-db psql -U payment_user -d payment_db \
-c "SELECT COUNT(*) FROM payments;"
count
-------
2
(1 row)
# Schema created. Seed data inserted. Both init files ran in alphabetical order.
# Simulate a container replacement — data must survive:
docker compose down # container removed — pgdata volume untouched
docker compose up -d # new container — mounts the same pgdata volume
docker exec postgres-db psql -U payment_user -d payment_db \
-c "SELECT COUNT(*) FROM payments;"
count
-------
2
# Data is still there. The container changed. The volume did not.
What just happened?
The named volume pgdata persisted through a full docker compose down and up cycle. The container was destroyed and recreated — the data was not. The initialisation scripts ran exactly once on first startup; on the second startup Postgres detected existing data in the volume and skipped them. The health check used pg_isready — which tests actual connection readiness, not just process presence — so dependent services don't attempt connections while Postgres is still replaying its write-ahead log.
Dockerizing MySQL
MySQL follows the same pattern as Postgres with one key difference: the initialisation directory is the same (/docker-entrypoint-initdb.d/), but the health check command uses mysqladmin ping instead of pg_isready. MySQL also separates the root password from the application user password — both must be set to avoid the container refusing to start.
version: "3.8"
services:
db:
image: mysql:8.0
environment:
- MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
# Root password — required. Used only for administrative operations.
- MYSQL_DATABASE=payment_db
- MYSQL_USER=payment_user
- MYSQL_PASSWORD=${DB_PASSWORD}
# Application user — this is what the app connects as, not root.
# The official image creates this user automatically on first startup.
volumes:
- mysqldata:/var/lib/mysql
# Named volume for MySQL data directory.
- ./init:/docker-entrypoint-initdb.d/
healthcheck:
test: ["CMD", "mysqladmin", "ping",
"-h", "localhost",
"-u", "payment_user",
"--password=${DB_PASSWORD}"]
# mysqladmin ping — tests whether MySQL is accepting connections.
# Tests with the application user, not root.
interval: 10s
timeout: 5s
retries: 5
start_period: 40s
# MySQL takes longer to initialise than Postgres on first startup —
# give it 40s before health checks begin.
restart: unless-stopped
deploy:
resources:
limits:
memory: 1G
volumes:
mysqldata:
Dockerizing Redis
Redis is an in-memory store — by default, it holds everything in RAM and writes nothing to disk. A container restart loses all data. For a cache this is acceptable. For a job queue, session store, or rate limiter it is not. The fix is Redis persistence — either RDB snapshots, append-only file (AOF), or both — combined with a named volume so the persistence files survive container replacement.
version: "3.8"
services:
redis:
image: redis:7-alpine
command: >
redis-server
--requirepass ${REDIS_PASSWORD}
--appendonly yes
--appendfsync everysec
--maxmemory 256mb
--maxmemory-policy allkeys-lru
--save 900 1
--save 300 10
--save 60 10000
# --requirepass → password authentication — from Lesson 33
# --appendonly yes → AOF persistence: every write logged to disk
# --appendfsync everysec → flush to disk once per second — balanced mode
# "always" = safest, "no" = fastest, "everysec" = default
# --maxmemory 256mb → hard cap on memory usage
# --maxmemory-policy allkeys-lru → when full, evict least-recently-used keys
# correct for a cache — wrong for a job queue
# --save 900 1 → RDB snapshot if ≥1 key changed in 900 seconds
# --save 300 10 → RDB snapshot if ≥10 keys changed in 300 seconds
# --save 60 10000 → RDB snapshot if ≥10000 keys changed in 60 seconds
volumes:
- redisdata:/data
# /data is where Redis writes both AOF and RDB files.
healthcheck:
test: ["CMD", "redis-cli",
"-a", "${REDIS_PASSWORD}",
"ping"]
# redis-cli ping returns PONG when Redis is ready.
# Must include -a for password-protected instances.
interval: 10s
timeout: 3s
retries: 3
start_period: 10s
restart: unless-stopped
deploy:
resources:
limits:
memory: 300M
# Slightly above maxmemory to allow Redis overhead without OOM kill.
volumes:
redisdata:
# Write a key, restart the container, verify persistence:
docker exec redis redis-cli -a ${REDIS_PASSWORD} SET session:u_8821 "active" EX 3600
OK
docker exec redis redis-cli -a ${REDIS_PASSWORD} GET session:u_8821
"active"
# Restart the container — AOF file persists in the volume:
docker compose restart redis
docker exec redis redis-cli -a ${REDIS_PASSWORD} GET session:u_8821
"active"
# Session survived the restart. AOF replayed on startup — data restored from log.
# Check persistence files on the volume:
docker exec redis ls -lh /data
-rw-r--r-- 1 redis redis 127 appendonly.aof
-rw-r--r-- 1 redis redis 92 dump.rdb
# Both AOF and RDB files present — double persistence layer.
What just happened?
The session key survived a container restart because AOF persistence was enabled and the AOF file lives in a named volume. On startup, Redis replayed the append-only log and restored all data written since the last restart. The maxmemory-policy allkeys-lru setting means Redis behaves correctly as a cache under memory pressure — it evicts the least recently used keys rather than refusing writes or crashing. The memory limit on the container is set slightly above maxmemory to give Redis room for its internal overhead without hitting the OOM killer.
Waiting for the Database — depends_on with Condition
The most common failure in a multi-container stack: the application starts, attempts a database connection, and crashes because the database container is running but not yet accepting queries. depends_on by itself only waits for the container to start — not for it to be healthy. The fix is condition: service_healthy, which holds the dependent service until the database's health check passes.
version: "3.8"
services:
api:
image: acmecorp/payment-api:${GIT_SHA}
depends_on:
db:
condition: service_healthy
# Do not start the API until db reports healthy via pg_isready.
# Without this, the API starts in parallel with Postgres and
# almost certainly attempts a connection before Postgres is ready.
redis:
condition: service_healthy
# Same for Redis — wait for redis-cli ping to succeed.
environment:
- DB_HOST=db
- REDIS_HOST=redis
ports:
- "3000:3000"
db:
image: postgres:15-alpine
healthcheck:
test: ["CMD-SHELL", "pg_isready -U payment_user -d payment_db"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
volumes:
- pgdata:/var/lib/postgresql/data
environment:
- POSTGRES_DB=payment_db
- POSTGRES_USER=payment_user
- POSTGRES_PASSWORD=${DB_PASSWORD}
redis:
image: redis:7-alpine
command: redis-server --requirepass ${REDIS_PASSWORD}
healthcheck:
test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
interval: 10s
timeout: 3s
retries: 3
start_period: 10s
volumes:
- redisdata:/data
volumes:
pgdata:
redisdata:
docker compose up [+] Running 3/3 ✔ Container redis Started ✔ Container postgres-db Started ✗ Container payment-api Waiting ← API is held — db not yet healthy # 35 seconds later: ✔ Container postgres-db Healthy ← pg_isready passed ✔ Container redis Healthy ← redis-cli ping returned PONG ✔ Container payment-api Started ← API now starts — both deps healthy payment-api | Server listening on :3000 payment-api | Database connected — pool size 10 payment-api | Redis connected — session store ready # Clean startup. No connection errors. No retry logic needed in the application.
What just happened?
The API container was held for 35 seconds while Postgres initialised and passed its health check. The API then started with both dependencies confirmed ready — no connection errors on startup, no retry logic required in application code. This is the correct way to sequence a multi-container stack. Without condition: service_healthy, the API would start within milliseconds of Postgres, hit a connection refused error, and either crash or require complex retry logic inside the application itself.
Backup — The One Thing Volumes Don't Do
Named volumes prevent data loss from container replacement. They do not prevent data loss from disk failure, accidental deletion, or corruption. A volume on a single host is a single point of failure. Production databases need backups — separate from the host, automated, and regularly tested. The following patterns work for any database container.
# Postgres — pg_dump to a compressed backup file
docker exec postgres-db pg_dump \
-U payment_user \
-d payment_db \
-F c \
-f /tmp/backup.dump
# -F c → custom format (compressed, supports selective restore)
# -f → output file path inside the container
# Copy the backup off the container and off the host:
docker cp postgres-db:/tmp/backup.dump ./backups/payment_db_$(date +%Y%m%d).dump
aws s3 cp ./backups/payment_db_$(date +%Y%m%d).dump \
s3://acmecorp-backups/postgres/
# Restore from backup — to a new or existing container:
docker exec -i postgres-db pg_restore \
-U payment_user \
-d payment_db \
--clean \
--if-exists \
< ./backups/payment_db_20240115.dump
# --clean → drop existing objects before restoring
# --if-exists → skip errors if objects don't exist yet (safe for fresh DB)
# Redis — BGSAVE triggers a background RDB snapshot immediately:
docker exec redis redis-cli -a ${REDIS_PASSWORD} BGSAVE
Background saving started
# Copy the RDB file off the container:
docker cp redis:/data/dump.rdb ./backups/redis_$(date +%Y%m%d).rdb
aws s3 cp ./backups/redis_$(date +%Y%m%d).rdb \
s3://acmecorp-backups/redis/
The Complete Database Setup Checklist
Before any database container goes to production
postgres:15-alpine not postgres:latest — version pinned for reproducibilitypg_isready, mysqladmin ping, or redis-cli ping — not a generic TCP checkmaxmemory to absorb overheadpg_dump or BGSAVE on a schedule, copied off-host, tested for restoreA Volume Is Not a Backup
A named volume protects your data from container replacement. It does not protect it from docker volume rm, disk failure, accidental host deletion, or filesystem corruption. If your only copy of production data is in a Docker volume on a single host, you do not have a backup — you have a single point of failure with extra steps. Schedule automated pg_dump exports, copy them to object storage (S3, GCS), and test restore at least monthly. A backup you have never restored is a backup you cannot trust.
Teacher's Note
For development: use Docker containers for databases freely — they're fast to spin up, isolated, and easy to reset with fresh data. For production on a single server: containers work well with named volumes and automated backups. For production at scale with high availability requirements: consider managed database services (RDS, Cloud SQL, ElastiCache) rather than self-managed containers — the operational overhead of running a highly-available database cluster in containers is significant and rarely justified when managed alternatives exist.
Practice Questions
1. SQL and shell scripts mounted into which directory are automatically executed by the official Postgres and MySQL images on first container startup?
2. To prevent an application container from starting until its database dependency has passed its health check — not just started — which depends_on condition must be set?
3. To enable Redis persistence so that data survives container restarts — by logging every write operation to a file — which Redis configuration option must be enabled?
Quiz
1. A Postgres container is running with a named volume pgdata. An engineer runs docker compose down then docker compose up. What happens to the database data?
2. A Redis container is used as a session store. It has a named volume but no persistence configuration. The container is restarted after a deploy. What happens to the sessions?
3. An API service has depends_on: db in its Compose file but keeps failing on startup with a database connection error. The Postgres container is running. What is the cause?
Up Next · Lesson 39
Docker for Microservices
Databases containerized correctly — now the architecture question: how does Docker change when you go from one backend service to ten? Microservices introduce service discovery, inter-container networking, shared infrastructure, and deployment coordination that a single-service setup never has to deal with.