Kubernetes Lesson 16 – Pod Lifecycle | Dataplexa

Core Kubernetes Concepts · Lesson 16

Pod Lifecycle

Every Pod you've ever created has lived through a specific sequence of phases — and knowing exactly what happens at each phase is the difference between someone who debugs fast and someone who stares at kubectl get pods wondering why it says "Pending" for ten minutes.

A Pod Is Not Just Running or Not Running

Most people learn Kubernetes and think of Pods in binary terms — either they're running or they're broken. The reality is more nuanced. A Pod travels through a clearly defined lifecycle from the moment you create it to the moment it terminates. Every phase has a name, a meaning, and a set of things that can go wrong inside it.

There's also a second layer underneath the phase: container states. A Pod has a phase. Each container inside a Pod has its own state. Both matter when debugging. And then there are conditions — boolean checkpoints that tell you exactly where in the lifecycle something stalled. Once you understand all three layers, you can diagnose almost any Pod problem in under two minutes.

🎯 The three layers of Pod health: Phase (where is the Pod in its overall lifecycle?), Conditions (which specific checkpoints have passed or failed?), Container State (what is each individual container actually doing right now?). You see all three in kubectl describe pod.

The Five Pod Phases

The STATUS column in kubectl get pods reflects the Pod's current phase. Here are all five, what they mean, and what's typically happening inside the cluster at each one.

Phase	What it means	Common causes / what to check
Pending	Pod accepted by the API server but not yet scheduled or containers not yet started	No node has enough CPU/memory, image still pulling, PVC not bound
Running	Pod is bound to a node and at least one container is running (or starting/restarting)	Healthy state — check READY column too (e.g. 0/1 Running means container started but not ready)
Succeeded	All containers exited with status 0 and won't be restarted	Normal for batch jobs and one-off tasks — not normal for long-running services
Failed	All containers have terminated and at least one exited with a non-zero status	App crash, OOM kill, bad entrypoint — check `kubectl logs --previous`
Unknown	Pod state can't be determined — usually means the node it's on lost contact with the control plane	Node networking issue, kubelet crashed — check node status with `kubectl get nodes`

The Complete Pod Lifecycle Flow

From API server acceptance to container running — here's every step Kubernetes takes and which component is responsible:

kubectl apply → API Server accepts the Pod

The API server validates the manifest, writes it to etcd, and sets the Pod phase to Pending. No node assigned yet.

Scheduler watches for unscheduled Pods

The kube-scheduler sees the Pod has no node assigned. It scores all available nodes against the Pod's resource requests, affinity rules, and taints. It picks the best node and writes the binding back to etcd.

kubelet on the target node picks it up

The kubelet on the assigned node watches etcd for Pods assigned to it. It sees the new Pod and begins work. Container state moves to Waiting (PullImage).

Container runtime pulls the image

The container runtime (containerd or Docker) pulls the image from the registry. If the image is already cached on the node, this step is near-instant. A large image on a cold node can take 30–60 seconds here.

Container starts — phase moves to Running

The container process starts. Container state moves to Running. Pod phase moves to Running. But "Running" does NOT mean "ready to serve traffic" yet — readiness probes may still be evaluating.

Readiness probe passes — Pod added to Service endpoints

Once the readiness probe passes (or if none is defined, immediately), the Pod is marked Ready and added to the Service's endpoint list. Only now does live traffic reach it.

Reading kubectl describe pod Like a Pro

The scenario: You're an SRE and an alert fires — a Pod in the order processing service is stuck. Your colleague says "it's Pending, I don't know why." You pull up your terminal. Here's how you read the output of kubectl describe pod to find the answer in under 60 seconds.

kubectl describe pod order-processor-6d8b9f-xk4pz
# describe pod: the single most important debugging command in Kubernetes
# Shows: phase, conditions, container states, resource requests, events
# Always scroll to the Events section at the bottom first during an incident

Name:             order-processor-6d8b9f-xk4pz
Namespace:        default
Node:             <none>
Status:           Pending
Conditions:
  Type           Status  Reason
  ----           ------  ------
  PodScheduled   False   Unschedulable
Containers:
  order-processor:
    Image:    company/order-processor:3.2.1
    State:    Waiting
      Reason: PodInitializing
    Requests:
      cpu:     4000m
      memory:  8Gi
Events:
  Warning  FailedScheduling  12s  default-scheduler
    0/3 nodes are available:
    3 Insufficient cpu.
    preemption: 0/3 nodes are available:
    3 No preemption victims found for incoming pod.

What just happened?

Node: <none> — The Pod hasn't been scheduled yet. If you see this plus Status: Pending, the scheduler hasn't found a suitable node.

Conditions: PodScheduled = False / Unschedulable — The four Pod conditions are PodScheduled, ContainersReady, Initialized, and Ready. They act like checkboxes — each one must be True before the next becomes relevant. Here, PodScheduled is False, so nothing else has even started.

Requests: cpu: 4000m, memory: 8Gi — The manifest is requesting 4 full CPU cores and 8GB of RAM. The cluster only has 3 nodes and none of them have that much free. The fix here is either to reduce the resource request in the manifest, add a new node to the cluster, or use the Vertical Pod Autoscaler.

Events: 0/3 nodes are available: 3 Insufficient cpu — This is the smoking gun. The Events section at the bottom of kubectl describe is the first place every experienced Kubernetes engineer looks. It tells you exactly what the scheduler tried and why it failed, in plain English. Bookmark this habit: describe → scroll to Events.

Container States: The Layer Below Pod Phase

While a Pod has a phase, each container inside it has one of three states. These are especially important when a Pod is in the Running phase but something still isn't right.

Container State	What it means	Common sub-reasons
Waiting	Container is not running yet — waiting for something	ContainerCreating, PullImage, ImagePullBackOff, CrashLoopBackOff
Running	Container process is executing normally	Healthy. Check READY column — 1/1 means passing readiness probe
Terminated	Container process has exited	Exit code 0 = clean exit; non-zero = error. OOMKilled = memory limit exceeded

Diagnosing CrashLoopBackOff

CrashLoopBackOff is the most famous Kubernetes error status. It means the container started, crashed immediately (non-zero exit), Kubernetes restarted it, it crashed again, and now Kubernetes is adding an exponential backoff delay before each restart attempt to avoid hammering the system.

The scenario: You've just rolled out a new version of the auth service. Within 30 seconds, your Slack pings — the Pod is in CrashLoopBackOff. Here are the exact commands you run to find out why.

kubectl get pods -n production
# Check status — look for CrashLoopBackOff and the RESTARTS count (high number = been crashing a while)

kubectl describe pod auth-service-5c7d8f-p2rkx -n production
# First look: Events section — did image pull fail? Did a liveness probe kill it?

kubectl logs auth-service-5c7d8f-p2rkx -n production --previous
# --previous is critical here — the container has already crashed and restarted
# Without --previous you get logs from the NEW (healthy) container, not the one that died
# The crash reason is always in the dying container's last log lines

kubectl logs auth-service-5c7d8f-p2rkx -n production --previous --tail=50
# --tail=50: show only the last 50 lines — crash logs are almost always at the end

$ kubectl get pods -n production
NAME                           READY   STATUS             RESTARTS   AGE
auth-service-5c7d8f-p2rkx      0/1     CrashLoopBackOff   7          9m

$ kubectl logs auth-service-5c7d8f-p2rkx -n production --previous --tail=20
2025-03-10T09:44:11Z INFO  Starting auth-service v2.4.0
2025-03-10T09:44:11Z INFO  Loading configuration from environment
2025-03-10T09:44:11Z ERROR Missing required environment variable: JWT_SECRET
2025-03-10T09:44:11Z FATAL Configuration validation failed. Exiting.

$ kubectl describe pod auth-service-5c7d8f-p2rkx -n production | grep -A5 "Events:"
Events:
  Warning  BackOff  8m  kubelet  Back-off restarting failed container auth-service
  Normal   Pulling  9m  kubelet  Pulling image "company/auth-service:2.4.0"
  Normal   Pulled   9m  kubelet  Successfully pulled image
  Normal   Started  9m  kubelet  Started container auth-service
  Warning  Failed   9m  kubelet  Error: failed to start container: exit code 1

What just happened?

RESTARTS: 7 — The container has crashed and been restarted 7 times in 9 minutes. Kubernetes uses exponential backoff — 10s, 20s, 40s, 80s, up to a maximum of 5 minutes between retries. By restart 7, Kubernetes is waiting several minutes between each attempt. This is the "BackOff" part of CrashLoopBackOff.

The actual cause — The logs show it immediately: Missing required environment variable: JWT_SECRET. The new version of the app added a required env var that wasn't added to the Deployment manifest. The fix is to add the Secret and reference it in the env block — not to keep restarting the Pod.

READY: 0/1 — Even though the Pod is technically in the Running phase (the container started), READY shows 0/1 because it crashed before the readiness probe passed. This is the difference between Running and Ready — a Pod can be Running but not Ready. Traffic will never reach a Pod with READY 0/1.

Pod Termination: How Pods Shut Down Gracefully

When you delete a Pod — or when a Deployment rolls out a new version — Kubernetes doesn't just SIGKILL the container. It follows a graceful termination sequence. Understanding this sequence is critical when your app needs to finish in-flight requests before dying.

Pod Graceful Termination Sequence

T+0s

kubectl delete pod issued → API server sets Pod phase to Terminating. Pod removed from Service endpoints immediately — no new traffic arrives.

T+0s

preStop hook runs (if defined) — a script or HTTP call that runs before the SIGTERM. Use this to deregister from service discovery, flush caches, or drain connections.

T+0s

SIGTERM sent to container process — your app should catch this signal and start shutting down gracefully: stop accepting new requests, finish in-flight ones, close DB connections.

T+30s

terminationGracePeriodSeconds countdown (default 30s) — Kubernetes waits this long for the container to exit on its own after receiving SIGTERM.

T+30s

SIGKILL — if the container hasn't exited by the end of the grace period, Kubernetes force-kills it. No more waiting. App must handle SIGTERM or requests die mid-flight.

The scenario: Your team's API service handles long-running data export jobs — some take up to 45 seconds. The default grace period is 30 seconds, so exports are getting cut off mid-stream during deploys. Here's how you extend the grace period in your Deployment manifest.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: export-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: export-api
  template:
    metadata:
      labels:
        app: export-api
    spec:
      terminationGracePeriodSeconds: 60   # Override default 30s — give containers 60s to finish
                                           # Set this to the max time your longest request can take
      containers:
        - name: export-api
          image: company/export-api:1.5.2
          ports:
            - containerPort: 8080
          lifecycle:                        # lifecycle: define hooks that run at container start/stop
            preStop:                        # preStop: runs BEFORE SIGTERM is sent
              exec:
                command: ["/bin/sh", "-c", "sleep 5"]
                # sleep 5: give the load balancer 5 seconds to stop routing traffic here
                # before the app starts getting SIGTERM and rejecting new connections
                # This gap prevents the race condition where new traffic arrives after
                # the endpoint is removed but before the app knows it's shutting down

$ kubectl apply -f export-deployment.yaml
deployment.apps/export-api configured

$ kubectl delete pod export-api-8c4f7d-j9pkx
pod "export-api-8c4f7d-j9pkx" deleted

$ kubectl get pod export-api-8c4f7d-j9pkx
NAME                        READY   STATUS        RESTARTS   AGE
export-api-8c4f7d-j9pkx     0/1     Terminating   0          4m

(pod disappears after 60 seconds — not 30)

What just happened?

terminationGracePeriodSeconds: 60 — This field lives under spec (at the Pod level, not container level) and overrides the 30-second default. Now when you delete a Pod or roll out a new image, Kubernetes waits up to 60 seconds for the container to exit gracefully before force-killing it.

lifecycle.preStop — The preStop hook runs synchronously before SIGTERM. The sleep 5 is a well-known production trick — it adds a 5-second buffer between when the endpoint is removed from the Service and when the app starts shutting down. Without it, you can get a brief window of 502 errors during rolling deploys.

STATUS: Terminating — When you see Terminating in kubectl get pods, the Pod is in the grace period. It's still running code but Kubernetes has marked it for deletion. If it's stuck Terminating for longer than terminationGracePeriodSeconds, the kubelet will force-kill it. If it's stuck Terminating for much longer (minutes/hours), the node may be offline.

Teacher's Note: Running ≠ Ready, and that difference matters in production

The single biggest misconception about the Pod lifecycle is treating "Running" as a synonym for "healthy." A Pod in the Running phase just means at least one container process started. It says nothing about whether your app is actually ready to serve traffic. That's what the Ready condition is for — and it's gated by your readiness probe (Lesson 23). Until READY is 1/1, the Pod won't receive any traffic from Services, regardless of its phase.

When debugging, always check both STATUS and READY. A Pod that is Running 0/1 is a Pod in trouble — either it's still warming up, its readiness probe is failing, or it crashed right after starting.

Practice Questions

1. A container starts, crashes immediately with a non-zero exit code, Kubernetes restarts it, and it crashes again. Kubernetes begins adding exponential delay between restart attempts. What STATUS does kubectl get pods show for this Pod?

2. What Pod spec field controls how long Kubernetes waits for a container to exit after sending SIGTERM before force-killing it with SIGKILL?

3. A Pod has restarted 5 times. You want to see the logs from the container instance that crashed, not the current running one. What kubectl command flag do you use?

Quiz

Up Next · Lesson 17

Labels and Selectors

The glue that holds Kubernetes together — how objects find each other and why your entire routing model depends on getting labels right.

← Previous Course Index Next →

Kubernetes Course

Pod Lifecycle

A Pod Is Not Just Running or Not Running

The Five Pod Phases

The Complete Pod Lifecycle Flow

Reading kubectl describe pod Like a Pro

Container States: The Layer Below Pod Phase

Diagnosing CrashLoopBackOff

Pod Termination: How Pods Shut Down Gracefully

Practice Questions

Quiz