Kubernetes Course
Resource Requests and Limits
Getting resource requests and limits wrong is the single most common cause of Kubernetes production incidents — either your cluster runs out of capacity and new Pods can't schedule, or a runaway container eats all the CPU and starves its neighbours, or a Pod keeps getting OOM-killed because someone set the memory limit too low. This lesson fixes all of that.
Requests vs Limits — Two Different Things
Most people treat requests and limits as one setting and set them to the same value. That's often wrong. They serve completely different purposes and affect your cluster in completely different ways.
A resource request is a scheduling promise. When you set requests.cpu: 250m, you're telling the scheduler "I need a node with at least 250 millicores free." The scheduler uses requests — not limits, not actual usage — to decide where to place your Pod. Once placed, the container is guaranteed at least that many resources.
A resource limit is a runtime ceiling. It's enforced by the Linux kernel via cgroups after the Pod is running. If a container tries to use more CPU than its limit, it gets throttled. If it tries to use more memory than its limit, it gets killed. The scheduler doesn't care about limits at placement time — only requests matter for scheduling.
The one-sentence version
Requests = what the scheduler uses to place your Pod. Limits = what the kernel uses to constrain your container at runtime. They're used at different times by different parts of the system.
CPU and Memory Units
Before writing manifests, the units need to be clear. They trip up almost everyone the first time.
| Resource | Unit | Meaning | Examples |
|---|---|---|---|
| CPU | m (millicores) | 1/1000th of one CPU core. 1000m = 1 full core. | 100m, 250m, 500m, 2000m |
| decimal | Whole or fractional CPU cores. 0.5 = 500m. | "0.25", "0.5", "1", "2" | |
| Memory | Mi (mebibytes) | 1 Mi = 1,048,576 bytes. Binary units — preferred in Kubernetes. | 128Mi, 256Mi, 512Mi, 1Gi |
| Gi (gibibytes) | 1 Gi = 1,073,741,824 bytes. 1Gi ≠ 1GB. | 1Gi, 2Gi, 4Gi, 8Gi | |
| M / G (megabytes) | 1 M = 1,000,000 bytes. Decimal units — valid but use Mi/Gi instead. | 128M, 256M, 1G |
⚠️ The Mi vs M confusion has caused real incidents
Setting memory: 1G gives 1,000,000,000 bytes. Setting memory: 1Gi gives 1,073,741,824 bytes — about 7% more. When engineers copy manifests and switch between M and Mi without thinking, the memory limit ends up slightly different from what was intended. Always use Mi and Gi — they match how the Linux kernel and JVM report memory.
Writing a Complete Resources Block
The scenario: You're a DevOps engineer deploying a Node.js API service for the first time. You've profiled the application in staging and know it typically uses around 150m CPU and 180Mi memory under normal load, with spikes to 300m CPU and 280Mi memory at peak. You need to set resources that guarantee the app has what it needs, cap runaway resource consumption, and let the scheduler make good placement decisions.
apiVersion: apps/v1
kind: Deployment
metadata:
name: inventory-api
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: inventory-api
template:
metadata:
labels:
app: inventory-api
spec:
containers:
- name: inventory-api
image: company/inventory-api:5.1.0
ports:
- containerPort: 3000
resources:
requests: # requests: what the SCHEDULER uses to place this Pod
cpu: "150m" # Guaranteed 0.15 CPU cores — scheduler finds a node with this free
memory: "200Mi" # Guaranteed 200 MiB RAM — scheduler finds a node with this free
limits: # limits: the KERNEL enforces these at runtime via cgroups
cpu: "500m" # CPU ceiling — if exceeded, process is THROTTLED (not killed)
memory: "350Mi" # Memory ceiling — if exceeded, process is KILLED (OOMKilled)
# Rule of thumb: limit.cpu = 2-4x request.cpu (bursting allowed)
# Rule of thumb: limit.memory = 1.5-2x request.memory
# Never set limit.memory too tight — OOMKill is brutal
$ kubectl apply -f inventory-deployment.yaml
deployment.apps/inventory-api created
$ kubectl get pods -n production
NAME READY STATUS RESTARTS AGE
inventory-api-6d8b9f-2xkpj 1/1 Running 0 11s
inventory-api-6d8b9f-7rvqn 1/1 Running 0 11s
inventory-api-6d8b9f-m4czl 1/1 Running 0 11s
$ kubectl describe pod inventory-api-6d8b9f-2xkpj -n production | grep -A8 "Limits:"
Limits:
cpu: 500m
memory: 350Mi
Requests:
cpu: 150m
memory: 200Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xxxWhat just happened?
How the scheduler used requests — When you applied this Deployment, the scheduler looked at each available node and checked: does this node have at least 150m CPU and 200Mi memory unallocated (i.e. not already promised to other Pods via their requests)? Nodes that didn't meet the bar were rejected. The scheduler doesn't know or care what CPU or memory the running containers are actually using — only what they've claimed via requests.
CPU throttling vs memory OOM-kill — CPU and memory behave fundamentally differently when a limit is exceeded. CPU is compressible: if a container tries to use more than 500m, Linux throttles it — it runs slower, but it keeps running. Memory is incompressible: if a container tries to use more than 350Mi, the Linux OOM killer immediately terminates the process. The container restarts. This is why memory limits need more headroom than CPU limits.
kubectl describe pod — Limits/Requests section — The kubectl describe pod output always shows the resolved limits and requests for each container. This is your source of truth for what's actually configured — not the Deployment YAML, which may have been modified since the Pod started.
What Happens When Limits Are Exceeded
The scenario: You're on-call and a Pod is in CrashLoopBackOff. Looking at the events and logs, you see OOMKilled. The memory limit is too low for what the service is actually consuming. Here's how to diagnose it, confirm it, and fix it.
kubectl describe pod inventory-api-6d8b9f-2xkpj -n production
# Look for: Last State, Exit Code, and Reason in the container section
# OOMKilled shows Exit Code: 137 and Reason: OOMKilled
kubectl top pod -n production
# top: shows ACTUAL current CPU and memory usage (requires metrics-server installed)
# Compare actual usage against your limits — if memory is at 90%+ of limit, expect OOMKills soon
# This is your real-time resource usage view
kubectl top pod -n production --sort-by=memory
# --sort-by: sort by memory or cpu — quickly find the most resource-hungry Pods
# Run this when a node reports memory pressure to identify the culprit fast
kubectl top node
# top node: shows CPU and memory usage across all nodes
# Helps identify nodes under pressure that might start evicting Pods
$ kubectl describe pod inventory-api-6d8b9f-2xkpj -n production
...
Containers:
inventory-api:
State: Running
Started: Mon, 10 Mar 2025 10:22:44 +0000
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Mon, 10 Mar 2025 10:19:11 +0000
Finished: Mon, 10 Mar 2025 10:22:43 +0000
Ready: True
Restart Count: 4
Limits:
cpu: 500m
memory: 350Mi
Requests:
cpu: 150m
memory: 200Mi
$ kubectl top pod -n production --sort-by=memory
NAME CPU(cores) MEMORY(bytes)
inventory-api-6d8b9f-2xkpj 142m 338Mi
inventory-api-6d8b9f-7rvqn 156m 341Mi
inventory-api-6d8b9f-m4czl 163m 344MiWhat just happened?
Exit Code 137 — Exit code 137 means the process was killed by signal 9 (SIGKILL). When you see Reason: OOMKilled alongside exit code 137, the Linux OOM killer terminated the container because it exceeded its memory limit. The fix is to increase limits.memory — not to restart the Pod, which just gets OOM-killed again at the same usage level.
kubectl top — the actual vs requested gap — The top output shows real current usage. The Pods are consuming 338–344Mi of memory, but the limit is 350Mi. They're running at 97% of the memory limit — they will be OOM-killed on the next memory spike. The right fix here is to raise the memory limit to at least 500Mi and raise the request proportionally.
Restart Count: 4 — Combined with Last State: OOMKilled, a high restart count means the container has been getting killed and restarted repeatedly. Users are experiencing intermittent failures every time this happens. This is an active production incident, not just a warning.
QoS Classes: How Kubernetes Prioritises Eviction
When a node runs out of memory, Kubernetes doesn't kill Pods randomly. It follows a priority order based on Quality of Service (QoS) classes — which are assigned automatically based on how you set requests and limits. Understanding QoS classes is the difference between your critical services surviving a memory pressure event and getting killed along with the junk.
QoS Classes — Eviction Order Under Memory Pressure
evicted FIRST
evicted SECOND
evicted LAST
Right-Sizing Resources in Practice
The scenario: You're deploying a new microservice and you have no idea what resources to set. You've done zero profiling. You need a starting point, a way to measure actual usage, and a path to tuning the values correctly over time. Here's the complete workflow.
apiVersion: apps/v1
kind: Deployment
metadata:
name: notification-service
namespace: production
spec:
replicas: 2
selector:
matchLabels:
app: notification-service
template:
metadata:
labels:
app: notification-service
spec:
containers:
- name: notification-service
image: company/notification-service:1.0.0
ports:
- containerPort: 8080
resources:
requests:
cpu: "100m" # Conservative starting point — 0.1 cores
memory: "128Mi" # Conservative starting point — 128 mebibytes
limits:
cpu: "500m" # 5x the request — wide ceiling for an unknown workload
memory: "512Mi" # 4x the request — generous headroom to avoid OOMKills while profiling
# After 1-2 weeks of production traffic, run kubectl top to observe
# actual usage and tighten these values accordingly
kubectl top pod -n production -l app=notification-service
# After a week of real traffic, observe peak usage
# Typical output tells you: actual CPU usage, actual memory usage
kubectl top pod -n production -l app=notification-service --containers
# --containers: breaks down usage by container within the Pod
# Essential for multi-container Pods to know which container is consuming what
kubectl describe node node-eu-west-1a | grep -A20 "Allocated resources:"
# Check how much of the node's capacity is already promised via requests
# "Requests" column shows committed allocations, "Limits" column shows ceiling commitments
# When requests approach 100%, new Pods can't schedule on that node even if actual usage is low
$ kubectl top pod -n production -l app=notification-service NAME CPU(cores) MEMORY(bytes) notification-service-5c9d4b-p2rkx 38m 94Mi notification-service-5c9d4b-x7qnl 41m 97Mi $ kubectl describe node node-eu-west-1a | grep -A20 "Allocated resources:" Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests % Limits % -------- -------- - ------ - cpu 3200m (40%) 10400m (130%) memory 6Gi (38%) 18Gi (112%) Non-terminated Pods: (14 in total) Namespace Name CPU Req Mem Req --------- ---- ------- ------- production notification-service-5c9d4b-... 100m 128Mi production inventory-api-6d8b9f-... 150m 200Mi
What just happened?
Actual usage vs requests — The notification service is actually using 38–41m CPU and 94–97Mi memory. We set requests of 100m CPU and 128Mi memory — a reasonable buffer. This tells us requests are well-calibrated. The limits (500m CPU, 512Mi memory) have huge headroom, which is fine for now but wastes schedulable capacity. After a few weeks of observing peak load, tighten them to about 2x actual peak usage.
Over-committed limits (130% CPU, 112% memory) — The node's limits exceed 100% of physical capacity. This is normal and expected — Kubernetes allows over-commitment on limits because not all containers hit their limits simultaneously. But if they do, the OOM killer starts evicting BestEffort and Burstable Pods to recover memory. This is why QoS class matters.
Requests at 40% CPU, 38% memory — Requests reflect how much of the node is allocated — not how much is being used. When requests hit around 80–90% of capacity, the scheduler will start struggling to place new Pods even if actual CPU usage is only 30%. This is the "noisy neighbour" problem — over-inflated requests lock up schedulable capacity. Right-sizing requests is just as important as right-sizing limits.
The Resource Sizing Decision Tree
A practical framework for deciding how to set resources for any new workload:
Set limits = 2x p99 peak usage.
req: 100m/128Mi, limit: 500m/512Mi.
Monitor for 1 week, then tune.
Set requests = limits → Guaranteed QoS. Never evicted.
requests < limits → Burstable QoS. Good balance of density and protection.
kubectl top pod to compare actual vs requested.Tighten requests to ~1.5x p95 actual. Tighten limits to ~2x p99 peak. Repeat quarterly.
Teacher's Note: The two failure modes and how to avoid both
There are two opposite ways to get resources wrong. Under-requesting: setting requests too low. The scheduler packs too many Pods onto nodes because it thinks they're lightweight. Actual usage exceeds node capacity, memory pressure builds, evictions cascade, and suddenly 40% of your Pods are being killed at once. This is the "noisy neighbour" that took down three services mentioned in Lesson 18 — except at the node level rather than the namespace level.
Over-requesting: setting requests too high. Every Pod claims far more than it uses. Nodes look "full" to the scheduler even when they're running at 20% actual utilisation. New Pods can't schedule. Cluster autoscaler spins up new nodes unnecessarily. You end up paying for 3x the cloud compute you actually need. I've seen clusters where actual CPU utilisation was 15% but every node was "full" because request values were copied from a legacy VM that had peak specs.
The right answer is continuous tuning: measure → observe peak usage → set requests to give a healthy buffer above p95 actual usage → set limits to give safety headroom above that → repeat. Tools like Vertical Pod Autoscaler (Lesson 50) can do this automatically.
Practice Questions
1. The Kubernetes scheduler uses which resource field — requests or limits — to decide which node to place a Pod on?
2. A container exceeds its memory limit. What does Kubernetes do, and what term appears in kubectl describe pod under Last State → Reason?
3. Which QoS class is assigned to a Pod when its requests and limits are set to the same value for all resources, making it the last to be evicted under memory pressure?
Quiz
1. A container's CPU usage exceeds its CPU limit. What happens?
2. A Pod manifest has no resources block at all. What QoS class is it assigned, and what does that mean for it during a memory pressure event?
3. A cluster's nodes show 85% CPU requests allocated, but kubectl top node shows actual CPU usage at only 20%. New Pods keep failing to schedule. What is the most likely explanation?
Up Next · Lesson 23
Health Checks
Liveness, readiness, and startup probes — how Kubernetes knows when your container is actually healthy and ready to serve traffic, not just running.