Docker Lesson 34 – Resource Limits | Dataplexa
Section III · Lesson 34

Resource Limits

A fintech startup ran twelve microservices on a single production host. One night a bug introduced an infinite loop in the notification service — it pegged one CPU core at 100% and started allocating memory in a tight loop. Within four minutes, the host was unresponsive. All twelve services — including payments — went down simultaneously. The bug was a three-line fix. The outage lasted two hours. Without resource limits, one container's problem is every container's problem.

By default, Docker places no ceiling on how much CPU or memory a container can consume. It will take everything the host has. Resource limits are the enforcement layer that turns that default into a guarantee: each container gets its share, none can starve the others, and a runaway process hits a wall instead of a production incident.

Unlimited vs Limited Containers

Default — no limits set

  • One container can consume 100% of host CPU
  • A memory leak can exhaust all host RAM
  • Linux OOM killer terminates random processes — including the Docker Daemon
  • Every other container on the host suffers
  • No visibility into what each container is actually consuming
  • Impossible to capacity plan or right-size the host

With resource limits enforced

  • Each container has a guaranteed CPU ceiling
  • Memory exhaustion kills only the offending container
  • OOM kill is contained — the Daemon and other containers are unaffected
  • Other services continue running normally during an incident
  • docker stats gives meaningful utilisation percentages
  • Predictable resource usage enables accurate host sizing

Memory Limits

Memory is the more critical limit to set. When a process leaks memory or allocates without bound, the Linux kernel's Out-Of-Memory (OOM) killer activates and starts terminating processes to reclaim memory. Without a container-level limit, the OOM killer picks victims across the entire host — it may kill the Docker Daemon itself. With a memory limit, the kernel kills only the process inside the offending container and Docker restarts it according to the restart policy. The blast radius stays contained.

docker run -d \
  --name payment-api \
  --memory 512m \
  --memory-swap 512m \
  -p 3000:3000 \
  payment-api:v1.2.0
# --memory 512m       → hard limit: container cannot exceed 512 MB of RAM
#                       if it tries, the Linux OOM killer terminates the process
#                       Docker then restarts the container per its restart policy
# --memory-swap 512m  → set swap equal to --memory to disable swap entirely
#                       swap == memory means 0 MB of swap is available
#                       prevents the process from silently spilling onto disk
#                       and masking a memory problem for hours before it crashes
# Confirm limits are applied:
docker inspect payment-api | grep -A4 '"Memory"'
"Memory": 536870912,
"MemorySwap": 536870912,
# 536870912 bytes = 512 MB — limit is confirmed.

# Simulate OOM — allocate beyond the limit from inside the container:
docker exec payment-api sh -c "dd if=/dev/zero of=/dev/null bs=1M count=600"

# Docker detects the OOM kill and restarts the container:
docker ps
CONTAINER ID   NAME          STATUS
a1b2c3d4e5f6   payment-api   Up 3 seconds (restarted 1 time)
# The container restarted. The host is unaffected. Other containers kept running.

What just happened?

The container exceeded its 512 MB memory limit. Linux OOM-killed the process inside the container. Docker's restart policy brought the container back within seconds. Every other container on the host continued running without interruption. Without the memory limit, that same leak would have consumed all available host RAM, triggered a host-wide OOM event, and potentially taken down every service — including the Docker Daemon itself.

CPU Limits

CPU limits work differently from memory limits — exceeding a CPU limit does not kill the process. Instead, the Linux kernel throttles it: the container is still running, but its CPU access is capped. A container at its CPU ceiling simply slows down rather than crashes. This is the correct behaviour for a busy service. The two flags you need are --cpus for the ceiling and --cpu-shares for relative priority when the host is under load.

docker run -d \
  --name payment-api \
  --cpus 1.5 \
  --cpu-shares 512 \
  -p 3000:3000 \
  payment-api:v1.2.0
# --cpus 1.5       → hard ceiling: the container can use at most 1.5 CPU cores
#                    on a 4-core host, this means 37.5% of total CPU capacity
#                    a spinning infinite loop hits this ceiling and is throttled —
#                    it cannot consume more, but other containers are unaffected
# --cpu-shares 512 → relative weight when the host is under contention
#                    default is 1024 — setting 512 gives this container half the
#                    CPU time of a default-weight container when all containers
#                    are competing simultaneously
#                    has no effect when the host has idle CPU capacity
# Confirm CPU limits:
docker inspect payment-api | grep -E '"NanoCpus"|"CpuShares"'
"NanoCpus": 1500000000,
"CpuShares": 512,
# NanoCpus: 1,500,000,000 = 1.5 CPU cores. Limit confirmed.

# Simulate a CPU spike — run a tight loop inside the container:
docker exec -d payment-api sh -c "while true; do :; done"

# Observe that the container is capped at ~1.5 cores:
docker stats payment-api --no-stream
CONTAINER ID   NAME          CPU %    MEM USAGE / LIMIT   MEM %
a1b2c3d4e5f6   payment-api   149.8%   210MiB / 512MiB     41.0%
# CPU is pegged at ~150% (1.5 cores out of 4) — throttled exactly at the limit.
# The other 2.5 cores remain fully available to all other containers on the host.

What just happened?

The infinite loop tried to consume every available CPU cycle on the host. The --cpus 1.5 limit throttled it at exactly 1.5 cores — the container kept running, slowed by throttling, while the remaining 2.5 cores stayed completely available to every other container. The runaway process was fully contained without killing anything or causing any interruption to other services.

Setting Limits in Docker Compose

Running docker run with flags works for single containers, but in practice most multi-service deployments use Docker Compose. Resource limits belong in the Compose file — declared once, applied consistently across every deployment.

version: "3.8"

services:
  api:
    image: payment-api:v1.2.0
    deploy:
      resources:
        limits:
          cpus: "1.5"
          # Hard ceiling — container is throttled if it exceeds this
          memory: 512M
          # Hard ceiling — OOM kill if exceeded
        reservations:
          cpus: "0.5"
          # Guaranteed minimum CPU — scheduler will not starve this service
          memory: 256M
          # Guaranteed minimum RAM — Docker will not schedule this container
          # on a host that cannot provide at least this much free memory
    ports:
      - "3000:3000"

  db:
    image: postgres:15-alpine
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 1G
        reservations:
          cpus: "1.0"
          memory: 512M
    volumes:
      - pgdata:/var/lib/postgresql/data

  redis:
    image: redis:7-alpine
    deploy:
      resources:
        limits:
          cpus: "0.5"
          memory: 128M
        reservations:
          cpus: "0.1"
          memory: 64M

volumes:
  pgdata:
docker compose up -d

[+] Running 3/3
 ✔ Container redis         Started
 ✔ Container postgres-db   Started
 ✔ Container payment-api   Started

# Verify limits are active across all containers:
docker stats --no-stream
CONTAINER ID   NAME          CPU %   MEM USAGE / LIMIT    MEM %
a1b2c3d4e5f6   payment-api   2.4%    198MiB / 512MiB      38.7%
b2c3d4e5f6a7   postgres-db   0.8%    312MiB / 1024MiB     30.5%
c3d4e5f6a7b8   redis         0.1%    8MiB / 128MiB         6.3%
# Each container shows its enforced memory ceiling in the LIMIT column.
# Total allocated: 1664 MiB maximum — fits cleanly on a 4 GB host.

limits vs reservations — what each does

limits.memory Hard ceiling. Exceeding it triggers OOM kill. The container is terminated and restarted per its restart policy.
limits.cpus Hard ceiling. Exceeding it causes throttling — the process slows but does not crash or restart.
reservations.memory Soft guarantee. Docker will not schedule this container on a host that cannot provide at least this much free memory.
reservations.cpus Soft guarantee. The scheduler prioritises this container's CPU access under contention — it will not be starved.

Monitoring with docker stats

docker stats is the built-in real-time view of resource consumption across all running containers. Without limits, the LIMIT column shows the total host memory — useless for spotting problems. With limits, it shows each container's individual ceiling and how close it is to hitting it. It's the first tool to reach for during an incident.

# Live stream of all running containers (refreshes every second):
docker stats

# Single snapshot — useful in scripts and CI checks:
docker stats --no-stream

# Filter to a specific container:
docker stats payment-api --no-stream

# Custom output format — machine-readable for monitoring pipelines:
docker stats --no-stream \
  --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"

# Key signals to watch:
# MEM % approaching 80-90%  → container close to OOM kill threshold
# CPU % sustained at limit   → throttling is occurring — service is degraded
# NET I/O spike              → unexpected traffic or a data leak
# BLOCK I/O spike            → unexpected disk activity — check logs or tmp growth
CONTAINER ID   NAME          CPU %   MEM USAGE / LIMIT    MEM %   NET I/O         BLOCK I/O
a1b2c3d4e5f6   payment-api   3.2%    201MiB / 512MiB      39.3%   14.5MB / 8.2MB  0B / 0B
b2c3d4e5f6a7   postgres-db   1.1%    318MiB / 1024MiB     31.1%   2.1MB / 1.4MB   142MB / 88MB
c3d4e5f6a7b8   redis         0.1%    8MiB / 128MiB         6.3%   980kB / 720kB   0B / 0B

# MEM % is the most critical column.
# A container at 85% of its memory limit is one traffic burst away from OOM kill.
# Without limits, the LIMIT column shows total host RAM — percentages are meaningless.
# With limits, every percentage is actionable.

What just happened?

docker stats shows each container's real consumption against its enforced limit. Without limits set, the LIMIT column would display total host RAM — making percentages meaningless and problems invisible until it's too late. With limits, a container at 85% memory is an early warning. At 95%, it's a pager alert. The number is only actionable because there's a ceiling to measure against.

How to Choose Your Limits

Setting limits too low causes OOM kills under normal load. Setting them too high wastes the purpose of having limits. The correct approach: run without limits first, measure under realistic load with docker stats, then set limits based on observed peak consumption with headroom built in.

# Step 1 — run with no limits and observe peak consumption under realistic load
docker stats --no-stream --format \
  "{{.Name}}: CPU={{.CPUPerc}} MEM={{.MemUsage}}"

# Example peaks observed during a load test:
# payment-api: CPU=62%   MEM=287MiB / host-total
# postgres-db: CPU=48%   MEM=611MiB / host-total
# redis:       CPU=4%    MEM=22MiB  / host-total

# Step 2 — set limits at observed peak + ~50% headroom
# payment-api peak RAM: 287 MiB  → limit: 430M
# postgres-db peak RAM: 611 MiB  → limit: 900M
# redis       peak RAM:  22 MiB  → limit:  64M

# Step 3 — re-run the load test with limits active and confirm:
# (a) No OOM kills occur during normal peak traffic
# (b) A runaway process does hit the ceiling without affecting other containers
# (c) docker stats shows sensible percentages — not 95%+ under normal load

# Rule of thumb:
# memory limit  = observed peak × 1.5
# cpu limit     = observed peak cores × 1.5  (round to nearest 0.25)
# reservation   = observed average (not peak) × 0.8

The Resource Limits Checklist

Before any container goes to production

Memory limit set — every container has an explicit --memory or limits.memory value
Swap disabled--memory-swap equals --memory to prevent silent disk spill
CPU ceiling set--cpus or limits.cpus prevents runaway processes from starving the host
Reservations set — scheduler knows the minimum resources each service requires to run correctly
Limits based on measured data — set after a load test, not guessed at deployment time
Headroom built in — limits at ~1.5× observed peak to absorb traffic spikes without OOM kills
docker stats monitored — memory % alerts configured before containers approach their ceilings
Restart policy setrestart: unless-stopped in Compose so OOM-killed containers recover automatically

Teacher's Note

Set memory limits first — that is the one that prevents a single container from taking down the entire host. CPU limits are important but the failure mode (throttling) is far less catastrophic than OOM. Start with broad limits based on your best estimate, load test, then tighten. Never leave a production container running without a memory limit. It's the container equivalent of running without a circuit breaker.

Practice Questions

1. To disable swap for a container — preventing it from silently spilling excess memory onto disk — which flag must be set to the same value as --memory?



2. When a container exceeds its --cpus limit, the process is not killed. What happens to it instead?



3. Which Docker CLI command provides a real-time view of CPU and memory consumption across all running containers, showing each container's usage against its enforced limit?



Quiz

1. A host runs eight containers with no resource limits. One container develops a memory leak. What is the worst-case outcome?


2. In a Docker Compose file, what is the practical difference between limits and reservations under deploy.resources?


3. A team is deploying a new service and needs to set its memory limit. They have never run it in production before. What is the correct process?


Up Next · Lesson 35

Logging & Monitoring

Resource limits contain the blast radius of a failure — but you still need to know when that failure happened and why. Logging and monitoring are how you find out: what a container was doing before it crashed, what's degraded right now, and what pattern predicts the next incident.