Kubernetes Lesson 22 – Resource Requests and Limits | Dataplexa

Core Kubernetes Concepts · Lesson 22

Resource Requests and Limits

Getting resource requests and limits wrong is the single most common cause of Kubernetes production incidents — either your cluster runs out of capacity and new Pods can't schedule, or a runaway container eats all the CPU and starves its neighbours, or a Pod keeps getting OOM-killed because someone set the memory limit too low. This lesson fixes all of that.

Requests vs Limits — Two Different Things

Most people treat requests and limits as one setting and set them to the same value. That's often wrong. They serve completely different purposes and affect your cluster in completely different ways.

A resource request is a scheduling promise. When you set requests.cpu: 250m, you're telling the scheduler "I need a node with at least 250 millicores free." The scheduler uses requests — not limits, not actual usage — to decide where to place your Pod. Once placed, the container is guaranteed at least that many resources.

A resource limit is a runtime ceiling. It's enforced by the Linux kernel via cgroups after the Pod is running. If a container tries to use more CPU than its limit, it gets throttled. If it tries to use more memory than its limit, it gets killed. The scheduler doesn't care about limits at placement time — only requests matter for scheduling.

The one-sentence version

Requests = what the scheduler uses to place your Pod. Limits = what the kernel uses to constrain your container at runtime. They're used at different times by different parts of the system.

CPU and Memory Units

Before writing manifests, the units need to be clear. They trip up almost everyone the first time.

Resource	Unit	Meaning	Examples
CPU	m (millicores)	1/1000th of one CPU core. 1000m = 1 full core.	100m, 250m, 500m, 2000m
CPU	decimal	Whole or fractional CPU cores. 0.5 = 500m.	"0.25", "0.5", "1", "2"
Memory	Mi (mebibytes)	1 Mi = 1,048,576 bytes. Binary units — preferred in Kubernetes.	128Mi, 256Mi, 512Mi, 1Gi
	Gi (gibibytes)	1 Gi = 1,073,741,824 bytes. 1Gi ≠ 1GB.	1Gi, 2Gi, 4Gi, 8Gi
	M / G (megabytes)	1 M = 1,000,000 bytes. Decimal units — valid but use Mi/Gi instead.	128M, 256M, 1G

⚠️ The Mi vs M confusion has caused real incidents

Setting memory: 1G gives 1,000,000,000 bytes. Setting memory: 1Gi gives 1,073,741,824 bytes — about 7% more. When engineers copy manifests and switch between M and Mi without thinking, the memory limit ends up slightly different from what was intended. Always use Mi and Gi — they match how the Linux kernel and JVM report memory.

Writing a Complete Resources Block

The scenario: You're a DevOps engineer deploying a Node.js API service for the first time. You've profiled the application in staging and know it typically uses around 150m CPU and 180Mi memory under normal load, with spikes to 300m CPU and 280Mi memory at peak. You need to set resources that guarantee the app has what it needs, cap runaway resource consumption, and let the scheduler make good placement decisions.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inventory-api
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: inventory-api
  template:
    metadata:
      labels:
        app: inventory-api
    spec:
      containers:
        - name: inventory-api
          image: company/inventory-api:5.1.0
          ports:
            - containerPort: 3000
          resources:
            requests:                   # requests: what the SCHEDULER uses to place this Pod
              cpu: "150m"               # Guaranteed 0.15 CPU cores — scheduler finds a node with this free
              memory: "200Mi"           # Guaranteed 200 MiB RAM — scheduler finds a node with this free
            limits:                     # limits: the KERNEL enforces these at runtime via cgroups
              cpu: "500m"               # CPU ceiling — if exceeded, process is THROTTLED (not killed)
              memory: "350Mi"           # Memory ceiling — if exceeded, process is KILLED (OOMKilled)
                                        # Rule of thumb: limit.cpu = 2-4x request.cpu (bursting allowed)
                                        # Rule of thumb: limit.memory = 1.5-2x request.memory
                                        # Never set limit.memory too tight — OOMKill is brutal

$ kubectl apply -f inventory-deployment.yaml
deployment.apps/inventory-api created

$ kubectl get pods -n production
NAME                             READY   STATUS    RESTARTS   AGE
inventory-api-6d8b9f-2xkpj       1/1     Running   0          11s
inventory-api-6d8b9f-7rvqn       1/1     Running   0          11s
inventory-api-6d8b9f-m4czl       1/1     Running   0          11s

$ kubectl describe pod inventory-api-6d8b9f-2xkpj -n production | grep -A8 "Limits:"
    Limits:
      cpu:     500m
      memory:  350Mi
    Requests:
      cpu:      150m
      memory:   200Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xxx

What just happened?

How the scheduler used requests — When you applied this Deployment, the scheduler looked at each available node and checked: does this node have at least 150m CPU and 200Mi memory unallocated (i.e. not already promised to other Pods via their requests)? Nodes that didn't meet the bar were rejected. The scheduler doesn't know or care what CPU or memory the running containers are actually using — only what they've claimed via requests.

CPU throttling vs memory OOM-kill — CPU and memory behave fundamentally differently when a limit is exceeded. CPU is compressible: if a container tries to use more than 500m, Linux throttles it — it runs slower, but it keeps running. Memory is incompressible: if a container tries to use more than 350Mi, the Linux OOM killer immediately terminates the process. The container restarts. This is why memory limits need more headroom than CPU limits.

kubectl describe pod — Limits/Requests section — The kubectl describe pod output always shows the resolved limits and requests for each container. This is your source of truth for what's actually configured — not the Deployment YAML, which may have been modified since the Pod started.

What Happens When Limits Are Exceeded

The scenario: You're on-call and a Pod is in CrashLoopBackOff. Looking at the events and logs, you see OOMKilled. The memory limit is too low for what the service is actually consuming. Here's how to diagnose it, confirm it, and fix it.

kubectl describe pod inventory-api-6d8b9f-2xkpj -n production
# Look for: Last State, Exit Code, and Reason in the container section
# OOMKilled shows Exit Code: 137 and Reason: OOMKilled

kubectl top pod -n production
# top: shows ACTUAL current CPU and memory usage (requires metrics-server installed)
# Compare actual usage against your limits — if memory is at 90%+ of limit, expect OOMKills soon
# This is your real-time resource usage view

kubectl top pod -n production --sort-by=memory
# --sort-by: sort by memory or cpu — quickly find the most resource-hungry Pods
# Run this when a node reports memory pressure to identify the culprit fast

kubectl top node
# top node: shows CPU and memory usage across all nodes
# Helps identify nodes under pressure that might start evicting Pods

$ kubectl describe pod inventory-api-6d8b9f-2xkpj -n production
...
Containers:
  inventory-api:
    State:          Running
      Started:      Mon, 10 Mar 2025 10:22:44 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Mon, 10 Mar 2025 10:19:11 +0000
      Finished:     Mon, 10 Mar 2025 10:22:43 +0000
    Ready:          True
    Restart Count:  4
    Limits:
      cpu:     500m
      memory:  350Mi
    Requests:
      cpu:      150m
      memory:   200Mi

$ kubectl top pod -n production --sort-by=memory
NAME                             CPU(cores)   MEMORY(bytes)
inventory-api-6d8b9f-2xkpj       142m         338Mi
inventory-api-6d8b9f-7rvqn       156m         341Mi
inventory-api-6d8b9f-m4czl       163m         344Mi

What just happened?

Exit Code 137 — Exit code 137 means the process was killed by signal 9 (SIGKILL). When you see Reason: OOMKilled alongside exit code 137, the Linux OOM killer terminated the container because it exceeded its memory limit. The fix is to increase limits.memory — not to restart the Pod, which just gets OOM-killed again at the same usage level.

kubectl top — the actual vs requested gap — The top output shows real current usage. The Pods are consuming 338–344Mi of memory, but the limit is 350Mi. They're running at 97% of the memory limit — they will be OOM-killed on the next memory spike. The right fix here is to raise the memory limit to at least 500Mi and raise the request proportionally.

Restart Count: 4 — Combined with Last State: OOMKilled, a high restart count means the container has been getting killed and restarted repeatedly. Users are experiencing intermittent failures every time this happens. This is an active production incident, not just a warning.

QoS Classes: How Kubernetes Prioritises Eviction

When a node runs out of memory, Kubernetes doesn't kill Pods randomly. It follows a priority order based on Quality of Service (QoS) classes — which are assigned automatically based on how you set requests and limits. Understanding QoS classes is the difference between your critical services surviving a memory pressure event and getting killed along with the junk.

QoS Classes — Eviction Order Under Memory Pressure

BestEffort
evicted FIRST

No requests, no limits set

The Pod makes no resource claims at all. Gets whatever is left over. First to be evicted under pressure. Never use this for production workloads.

resources: {}  ← BestEffort — no requests, no limits

Burstable
evicted SECOND

Requests set, limits different from requests (or limits not set)

The most common class. The Pod has a guaranteed floor (requests) but can burst above it up to the limit. Evicted after BestEffort but before Guaranteed.

requests.cpu: 150m, limits.cpu: 500m  ← Burstable

Guaranteed
evicted LAST

Requests = limits for ALL resources (CPU and memory)

The Pod gets exactly what it asks for — no more, no less. Never evicted under normal memory pressure. Use for critical stateful services like databases. The trade-off: no bursting.

requests.memory: 512Mi = limits.memory: 512Mi  ← Guaranteed

Right-Sizing Resources in Practice

The scenario: You're deploying a new microservice and you have no idea what resources to set. You've done zero profiling. You need a starting point, a way to measure actual usage, and a path to tuning the values correctly over time. Here's the complete workflow.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: notification-service
  namespace: production
spec:
  replicas: 2
  selector:
    matchLabels:
      app: notification-service
  template:
    metadata:
      labels:
        app: notification-service
    spec:
      containers:
        - name: notification-service
          image: company/notification-service:1.0.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: "100m"     # Conservative starting point — 0.1 cores
              memory: "128Mi" # Conservative starting point — 128 mebibytes
            limits:
              cpu: "500m"     # 5x the request — wide ceiling for an unknown workload
              memory: "512Mi" # 4x the request — generous headroom to avoid OOMKills while profiling
                              # After 1-2 weeks of production traffic, run kubectl top to observe
                              # actual usage and tighten these values accordingly

kubectl top pod -n production -l app=notification-service
# After a week of real traffic, observe peak usage
# Typical output tells you: actual CPU usage, actual memory usage

kubectl top pod -n production -l app=notification-service --containers
# --containers: breaks down usage by container within the Pod
# Essential for multi-container Pods to know which container is consuming what

kubectl describe node node-eu-west-1a | grep -A20 "Allocated resources:"
# Check how much of the node's capacity is already promised via requests
# "Requests" column shows committed allocations, "Limits" column shows ceiling commitments
# When requests approach 100%, new Pods can't schedule on that node even if actual usage is low

$ kubectl top pod -n production -l app=notification-service
NAME                                  CPU(cores)   MEMORY(bytes)
notification-service-5c9d4b-p2rkx     38m          94Mi
notification-service-5c9d4b-x7qnl     41m          97Mi

$ kubectl describe node node-eu-west-1a | grep -A20 "Allocated resources:"
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      %     Limits       %
  --------           --------      -     ------       -
  cpu                3200m (40%)   10400m (130%)
  memory             6Gi   (38%)   18Gi  (112%)

Non-terminated Pods:         (14 in total)
  Namespace   Name                             CPU Req  Mem Req
  ---------   ----                             -------  -------
  production  notification-service-5c9d4b-...  100m     128Mi
  production  inventory-api-6d8b9f-...         150m     200Mi

What just happened?

Actual usage vs requests — The notification service is actually using 38–41m CPU and 94–97Mi memory. We set requests of 100m CPU and 128Mi memory — a reasonable buffer. This tells us requests are well-calibrated. The limits (500m CPU, 512Mi memory) have huge headroom, which is fine for now but wastes schedulable capacity. After a few weeks of observing peak load, tighten them to about 2x actual peak usage.

Over-committed limits (130% CPU, 112% memory) — The node's limits exceed 100% of physical capacity. This is normal and expected — Kubernetes allows over-commitment on limits because not all containers hit their limits simultaneously. But if they do, the OOM killer starts evicting BestEffort and Burstable Pods to recover memory. This is why QoS class matters.

Requests at 40% CPU, 38% memory — Requests reflect how much of the node is allocated — not how much is being used. When requests hit around 80–90% of capacity, the scheduler will start struggling to place new Pods even if actual CPU usage is only 30%. This is the "noisy neighbour" problem — over-inflated requests lock up schedulable capacity. Right-sizing requests is just as important as right-sizing limits.

The Resource Sizing Decision Tree

A practical framework for deciding how to set resources for any new workload:

Start: New service to deploy

Have profiling data?

YES → Set requests = p95 usage from staging.
Set limits = 2x p99 peak usage.

No profiling data?

NO → Use conservative defaults:
req: 100m/128Mi, limit: 500m/512Mi.
Monitor for 1 week, then tune.

Critical stateful service (DB, cache, queue)?
Set requests = limits → Guaranteed QoS. Never evicted.

Standard stateless API?
requests < limits → Burstable QoS. Good balance of density and protection.

After 1–2 weeks: run kubectl top pod to compare actual vs requested.
Tighten requests to ~1.5x p95 actual. Tighten limits to ~2x p99 peak. Repeat quarterly.

Teacher's Note: The two failure modes and how to avoid both

There are two opposite ways to get resources wrong. Under-requesting: setting requests too low. The scheduler packs too many Pods onto nodes because it thinks they're lightweight. Actual usage exceeds node capacity, memory pressure builds, evictions cascade, and suddenly 40% of your Pods are being killed at once. This is the "noisy neighbour" that took down three services mentioned in Lesson 18 — except at the node level rather than the namespace level.

Over-requesting: setting requests too high. Every Pod claims far more than it uses. Nodes look "full" to the scheduler even when they're running at 20% actual utilisation. New Pods can't schedule. Cluster autoscaler spins up new nodes unnecessarily. You end up paying for 3x the cloud compute you actually need. I've seen clusters where actual CPU utilisation was 15% but every node was "full" because request values were copied from a legacy VM that had peak specs.

The right answer is continuous tuning: measure → observe peak usage → set requests to give a healthy buffer above p95 actual usage → set limits to give safety headroom above that → repeat. Tools like Vertical Pod Autoscaler (Lesson 50) can do this automatically.

Practice Questions

1. The Kubernetes scheduler uses which resource field — requests or limits — to decide which node to place a Pod on?

2. A container exceeds its memory limit. What does Kubernetes do, and what term appears in kubectl describe pod under Last State → Reason?

3. Which QoS class is assigned to a Pod when its requests and limits are set to the same value for all resources, making it the last to be evicted under memory pressure?

Quiz

Up Next · Lesson 23

Health Checks

Liveness, readiness, and startup probes — how Kubernetes knows when your container is actually healthy and ready to serve traffic, not just running.

← Previous Course Index Next →

Kubernetes Course

Resource Requests and Limits

Requests vs Limits — Two Different Things

CPU and Memory Units

Writing a Complete Resources Block

What Happens When Limits Are Exceeded

QoS Classes: How Kubernetes Prioritises Eviction

Right-Sizing Resources in Practice

The Resource Sizing Decision Tree

Practice Questions

Quiz