Kubernetes Course
Pod Lifecycle
Every Pod you've ever created has lived through a specific sequence of phases — and knowing exactly what happens at each phase is the difference between someone who debugs fast and someone who stares at kubectl get pods wondering why it says "Pending" for ten minutes.
A Pod Is Not Just Running or Not Running
Most people learn Kubernetes and think of Pods in binary terms — either they're running or they're broken. The reality is more nuanced. A Pod travels through a clearly defined lifecycle from the moment you create it to the moment it terminates. Every phase has a name, a meaning, and a set of things that can go wrong inside it.
There's also a second layer underneath the phase: container states. A Pod has a phase. Each container inside a Pod has its own state. Both matter when debugging. And then there are conditions — boolean checkpoints that tell you exactly where in the lifecycle something stalled. Once you understand all three layers, you can diagnose almost any Pod problem in under two minutes.
🎯 The three layers of Pod health: Phase (where is the Pod in its overall lifecycle?), Conditions (which specific checkpoints have passed or failed?), Container State (what is each individual container actually doing right now?). You see all three in kubectl describe pod.
The Five Pod Phases
The STATUS column in kubectl get pods reflects the Pod's current phase. Here are all five, what they mean, and what's typically happening inside the cluster at each one.
| Phase | What it means | Common causes / what to check |
|---|---|---|
| Pending | Pod accepted by the API server but not yet scheduled or containers not yet started | No node has enough CPU/memory, image still pulling, PVC not bound |
| Running | Pod is bound to a node and at least one container is running (or starting/restarting) | Healthy state — check READY column too (e.g. 0/1 Running means container started but not ready) |
| Succeeded | All containers exited with status 0 and won't be restarted | Normal for batch jobs and one-off tasks — not normal for long-running services |
| Failed | All containers have terminated and at least one exited with a non-zero status | App crash, OOM kill, bad entrypoint — check kubectl logs --previous |
| Unknown | Pod state can't be determined — usually means the node it's on lost contact with the control plane | Node networking issue, kubelet crashed — check node status with kubectl get nodes |
The Complete Pod Lifecycle Flow
From API server acceptance to container running — here's every step Kubernetes takes and which component is responsible:
Pending. No node assigned yet.Waiting (PullImage).Running. Pod phase moves to Running. But "Running" does NOT mean "ready to serve traffic" yet — readiness probes may still be evaluating.Ready and added to the Service's endpoint list. Only now does live traffic reach it.Reading kubectl describe pod Like a Pro
The scenario: You're an SRE and an alert fires — a Pod in the order processing service is stuck. Your colleague says "it's Pending, I don't know why." You pull up your terminal. Here's how you read the output of kubectl describe pod to find the answer in under 60 seconds.
kubectl describe pod order-processor-6d8b9f-xk4pz
# describe pod: the single most important debugging command in Kubernetes
# Shows: phase, conditions, container states, resource requests, events
# Always scroll to the Events section at the bottom first during an incident
Name: order-processor-6d8b9f-xk4pz
Namespace: default
Node: <none>
Status: Pending
Conditions:
Type Status Reason
---- ------ ------
PodScheduled False Unschedulable
Containers:
order-processor:
Image: company/order-processor:3.2.1
State: Waiting
Reason: PodInitializing
Requests:
cpu: 4000m
memory: 8Gi
Events:
Warning FailedScheduling 12s default-scheduler
0/3 nodes are available:
3 Insufficient cpu.
preemption: 0/3 nodes are available:
3 No preemption victims found for incoming pod.What just happened?
Node: <none> — The Pod hasn't been scheduled yet. If you see this plus Status: Pending, the scheduler hasn't found a suitable node.
Conditions: PodScheduled = False / Unschedulable — The four Pod conditions are PodScheduled, ContainersReady, Initialized, and Ready. They act like checkboxes — each one must be True before the next becomes relevant. Here, PodScheduled is False, so nothing else has even started.
Requests: cpu: 4000m, memory: 8Gi — The manifest is requesting 4 full CPU cores and 8GB of RAM. The cluster only has 3 nodes and none of them have that much free. The fix here is either to reduce the resource request in the manifest, add a new node to the cluster, or use the Vertical Pod Autoscaler.
Events: 0/3 nodes are available: 3 Insufficient cpu — This is the smoking gun. The Events section at the bottom of kubectl describe is the first place every experienced Kubernetes engineer looks. It tells you exactly what the scheduler tried and why it failed, in plain English. Bookmark this habit: describe → scroll to Events.
Container States: The Layer Below Pod Phase
While a Pod has a phase, each container inside it has one of three states. These are especially important when a Pod is in the Running phase but something still isn't right.
| Container State | What it means | Common sub-reasons |
|---|---|---|
| Waiting | Container is not running yet — waiting for something | ContainerCreating, PullImage, ImagePullBackOff, CrashLoopBackOff |
| Running | Container process is executing normally | Healthy. Check READY column — 1/1 means passing readiness probe |
| Terminated | Container process has exited | Exit code 0 = clean exit; non-zero = error. OOMKilled = memory limit exceeded |
Diagnosing CrashLoopBackOff
CrashLoopBackOff is the most famous Kubernetes error status. It means the container started, crashed immediately (non-zero exit), Kubernetes restarted it, it crashed again, and now Kubernetes is adding an exponential backoff delay before each restart attempt to avoid hammering the system.
The scenario: You've just rolled out a new version of the auth service. Within 30 seconds, your Slack pings — the Pod is in CrashLoopBackOff. Here are the exact commands you run to find out why.
kubectl get pods -n production
# Check status — look for CrashLoopBackOff and the RESTARTS count (high number = been crashing a while)
kubectl describe pod auth-service-5c7d8f-p2rkx -n production
# First look: Events section — did image pull fail? Did a liveness probe kill it?
kubectl logs auth-service-5c7d8f-p2rkx -n production --previous
# --previous is critical here — the container has already crashed and restarted
# Without --previous you get logs from the NEW (healthy) container, not the one that died
# The crash reason is always in the dying container's last log lines
kubectl logs auth-service-5c7d8f-p2rkx -n production --previous --tail=50
# --tail=50: show only the last 50 lines — crash logs are almost always at the end
$ kubectl get pods -n production NAME READY STATUS RESTARTS AGE auth-service-5c7d8f-p2rkx 0/1 CrashLoopBackOff 7 9m $ kubectl logs auth-service-5c7d8f-p2rkx -n production --previous --tail=20 2025-03-10T09:44:11Z INFO Starting auth-service v2.4.0 2025-03-10T09:44:11Z INFO Loading configuration from environment 2025-03-10T09:44:11Z ERROR Missing required environment variable: JWT_SECRET 2025-03-10T09:44:11Z FATAL Configuration validation failed. Exiting. $ kubectl describe pod auth-service-5c7d8f-p2rkx -n production | grep -A5 "Events:" Events: Warning BackOff 8m kubelet Back-off restarting failed container auth-service Normal Pulling 9m kubelet Pulling image "company/auth-service:2.4.0" Normal Pulled 9m kubelet Successfully pulled image Normal Started 9m kubelet Started container auth-service Warning Failed 9m kubelet Error: failed to start container: exit code 1
What just happened?
RESTARTS: 7 — The container has crashed and been restarted 7 times in 9 minutes. Kubernetes uses exponential backoff — 10s, 20s, 40s, 80s, up to a maximum of 5 minutes between retries. By restart 7, Kubernetes is waiting several minutes between each attempt. This is the "BackOff" part of CrashLoopBackOff.
The actual cause — The logs show it immediately: Missing required environment variable: JWT_SECRET. The new version of the app added a required env var that wasn't added to the Deployment manifest. The fix is to add the Secret and reference it in the env block — not to keep restarting the Pod.
READY: 0/1 — Even though the Pod is technically in the Running phase (the container started), READY shows 0/1 because it crashed before the readiness probe passed. This is the difference between Running and Ready — a Pod can be Running but not Ready. Traffic will never reach a Pod with READY 0/1.
Pod Termination: How Pods Shut Down Gracefully
When you delete a Pod — or when a Deployment rolls out a new version — Kubernetes doesn't just SIGKILL the container. It follows a graceful termination sequence. Understanding this sequence is critical when your app needs to finish in-flight requests before dying.
Pod Graceful Termination Sequence
The scenario: Your team's API service handles long-running data export jobs — some take up to 45 seconds. The default grace period is 30 seconds, so exports are getting cut off mid-stream during deploys. Here's how you extend the grace period in your Deployment manifest.
apiVersion: apps/v1
kind: Deployment
metadata:
name: export-api
spec:
replicas: 2
selector:
matchLabels:
app: export-api
template:
metadata:
labels:
app: export-api
spec:
terminationGracePeriodSeconds: 60 # Override default 30s — give containers 60s to finish
# Set this to the max time your longest request can take
containers:
- name: export-api
image: company/export-api:1.5.2
ports:
- containerPort: 8080
lifecycle: # lifecycle: define hooks that run at container start/stop
preStop: # preStop: runs BEFORE SIGTERM is sent
exec:
command: ["/bin/sh", "-c", "sleep 5"]
# sleep 5: give the load balancer 5 seconds to stop routing traffic here
# before the app starts getting SIGTERM and rejecting new connections
# This gap prevents the race condition where new traffic arrives after
# the endpoint is removed but before the app knows it's shutting down
$ kubectl apply -f export-deployment.yaml deployment.apps/export-api configured $ kubectl delete pod export-api-8c4f7d-j9pkx pod "export-api-8c4f7d-j9pkx" deleted $ kubectl get pod export-api-8c4f7d-j9pkx NAME READY STATUS RESTARTS AGE export-api-8c4f7d-j9pkx 0/1 Terminating 0 4m (pod disappears after 60 seconds — not 30)
What just happened?
terminationGracePeriodSeconds: 60 — This field lives under spec (at the Pod level, not container level) and overrides the 30-second default. Now when you delete a Pod or roll out a new image, Kubernetes waits up to 60 seconds for the container to exit gracefully before force-killing it.
lifecycle.preStop — The preStop hook runs synchronously before SIGTERM. The sleep 5 is a well-known production trick — it adds a 5-second buffer between when the endpoint is removed from the Service and when the app starts shutting down. Without it, you can get a brief window of 502 errors during rolling deploys.
STATUS: Terminating — When you see Terminating in kubectl get pods, the Pod is in the grace period. It's still running code but Kubernetes has marked it for deletion. If it's stuck Terminating for longer than terminationGracePeriodSeconds, the kubelet will force-kill it. If it's stuck Terminating for much longer (minutes/hours), the node may be offline.
Teacher's Note: Running ≠ Ready, and that difference matters in production
The single biggest misconception about the Pod lifecycle is treating "Running" as a synonym for "healthy." A Pod in the Running phase just means at least one container process started. It says nothing about whether your app is actually ready to serve traffic. That's what the Ready condition is for — and it's gated by your readiness probe (Lesson 23). Until READY is 1/1, the Pod won't receive any traffic from Services, regardless of its phase.
When debugging, always check both STATUS and READY. A Pod that is Running 0/1 is a Pod in trouble — either it's still warming up, its readiness probe is failing, or it crashed right after starting.
Practice Questions
1. A container starts, crashes immediately with a non-zero exit code, Kubernetes restarts it, and it crashes again. Kubernetes begins adding exponential delay between restart attempts. What STATUS does kubectl get pods show for this Pod?
2. What Pod spec field controls how long Kubernetes waits for a container to exit after sending SIGTERM before force-killing it with SIGKILL?
3. A Pod has restarted 5 times. You want to see the logs from the container instance that crashed, not the current running one. What kubectl command flag do you use?
Quiz
1. A Pod has been in Pending for 10 minutes and kubectl describe pod shows 0/5 nodes are available: 5 Insufficient memory in Events. What is the most likely cause?
2. A newly started Pod shows STATUS: Running but READY: 0/1 in kubectl get pods. What does this mean?
3. In what order does Kubernetes signal a container during graceful termination?
Up Next · Lesson 17
Labels and Selectors
The glue that holds Kubernetes together — how objects find each other and why your entire routing model depends on getting labels right.