Kubernetes Course
Kubernetes Upgrades
Kubernetes releases a new minor version every four months and supports only the three most recent minor versions. Running an unsupported version means no security patches and no bug fixes — staying current is not optional. This lesson covers the upgrade lifecycle, deprecated API handling, managed cluster upgrades on EKS, self-managed upgrades with kubeadm, and the zero-downtime runbook.
The Kubernetes Release and Support Model
Kubernetes follows a predictable release cadence. Understanding it is the starting point for any upgrade strategy.
| What | Details |
|---|---|
| Release cadence | A new minor version (1.29, 1.30, 1.31…) every ~4 months. Patch versions (1.29.1, 1.29.2) are released as needed for bug and security fixes. |
| Support window | The community supports the last three minor versions. With versions 1.29, 1.30, 1.31 supported, 1.28 is end-of-life — no more patches. In practice you have ~14 months before a version is unsupported. |
| Upgrade path | You must upgrade one minor version at a time. Skipping from 1.27 to 1.29 is not supported. Control plane must be upgraded before worker nodes. |
| API deprecations | APIs are deprecated before removal. Typically: deprecated in version N, removed in N+3. Check the changelog before each upgrade. |
| Managed vs self-managed | Managed services (EKS, GKE, AKS) handle control plane upgrades. Self-managed clusters using kubeadm require manual control plane and node upgrades. |
Pre-Upgrade Checklist
Skipping pre-upgrade checks is the most common cause of upgrade-related incidents. Run these before every minor version upgrade.
# 1. Check which APIs in your cluster are deprecated or removed in the target version
# Use Pluto -- scans live cluster resources and Helm charts for deprecated APIs
pluto detect-all-in-cluster --target-versions k8s=v1.30.0
# Output: lists every resource using a deprecated API, its namespace, and the replacement
# 2. Scan all deployed Helm charts for deprecated APIs
helm list -A -o json | jq -r '.[] | "\(.name) \(.namespace)"' | \
while read name ns; do
helm get manifest $name -n $ns | pluto detect --target-versions k8s=v1.30.0 -f -
done
# 3. Check the CHANGELOG for breaking changes in the target version
# https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md
# 4. Verify your cluster add-ons support the target version
# (CoreDNS, kube-proxy, CNI plugin, metrics-server, Ingress controller)
kubectl get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}' \
| sort -u | grep -v pause
# Cross-check each image version against its compatibility matrix
# 5. Test the upgrade in a non-production cluster first
# -- staging should be one minor version ahead of production at all times
Handling Deprecated APIs
The scenario: You are upgrading from Kubernetes 1.24 to 1.25. The API audit log shows your cluster is still using the policy/v1beta1 PodSecurityPolicy API, which was removed in 1.25. You need to migrate before upgrading.
# Find all resources using deprecated API versions
kubectl get podsecuritypolicies.policy -A 2>/dev/null && \
echo "WARNING: PodSecurityPolicy still in use -- removed in 1.25"
# For other deprecated APIs -- check what your manifests use
grep -r "apiVersion:" ./k8s-manifests/ | grep -E "extensions/v1beta1|networking.k8s.io/v1beta1"
# extensions/v1beta1 Ingress was removed in 1.22 -- must use networking.k8s.io/v1
# Common API migrations by version:
# 1.22: Ingress extensions/v1beta1 -> networking.k8s.io/v1
# HorizontalPodAutoscaler autoscaling/v2beta1 -> autoscaling/v2
# 1.25: PodSecurityPolicy policy/v1beta1 -- REMOVED entirely (replaced by PSA)
# 1.26: HorizontalPodAutoscaler autoscaling/v2beta2 -> autoscaling/v2
# 1.29: FlowSchema/PriorityLevelConfiguration flowcontrol.apiserver.k8s.io/v1beta3 -> v1
# Use kubectl convert to migrate a manifest to the new API version
kubectl convert -f old-ingress.yaml --output-version networking.k8s.io/v1
# Then apply and remove the old manifest
$ pluto detect-all-in-cluster --target-versions k8s=v1.30.0 KIND VERSION NAMESPACE NAME DEPRECATED REMOVED Ingress extensions/v1beta1 payments payment-api true true ← must fix before upgrade HPA autoscaling/v2beta2 payments payment-api-hpa true false ← deprecated, still works $ kubectl convert -f old-ingress.yaml --output-version networking.k8s.io/v1 apiVersion: networking.k8s.io/v1 kind: Ingress [... converted manifest ...] # Apply the converted manifest and remove the old one
What just happened?
API removal is a hard break — A deprecated API still works until it is removed. Once removed, any manifest or Helm chart using the old API version will fail to apply with a "no matches for kind" error. This is why pre-upgrade API scanning is non-negotiable. Pluto and kubectl deprecations give you weeks of warning before the upgrade.
Helm charts may use deprecated APIs — Third-party charts are often slower to update. If helm get manifest shows the chart is using a removed API, upgrade the chart version before upgrading the cluster — or fork the chart and update the API version manually as a temporary fix.
Upgrading EKS (Managed Cluster)
The scenario: Your EKS cluster is running 1.29. You are upgrading to 1.30. EKS handles the control plane upgrade; you upgrade the managed node groups afterwards.
# Step 1: Upgrade the EKS control plane
aws eks update-cluster-version \
--name production \
--kubernetes-version 1.30 \
--region us-east-1
# EKS upgrades the API server, etcd, scheduler, controller-manager
# Control plane upgrade takes 10-20 minutes
# The cluster is available throughout -- EKS does a rolling control plane upgrade
# Watch the upgrade status
aws eks describe-cluster \
--name production \
--query "cluster.status"
# UPDATING -> ACTIVE (upgrade complete)
# Step 2: Update cluster add-ons to versions compatible with 1.30
aws eks update-addon --cluster-name production --addon-name kube-proxy \
--addon-version v1.30.0-eksbuild.3
aws eks update-addon --cluster-name production --addon-name coredns \
--addon-version v1.11.1-eksbuild.4
aws eks update-addon --cluster-name production --addon-name vpc-cni \
--addon-version v1.18.0-eksbuild.1
# Check recommended versions:
aws eks describe-addon-versions --kubernetes-version 1.30 \
--query 'addons[].{addon:addonName,version:addonVersions[0].addonVersion}'
# Step 3: Upgrade managed node groups (one at a time)
aws eks update-nodegroup-version \
--cluster-name production \
--nodegroup-name app-nodes \
--kubernetes-version 1.30
# EKS cordons old nodes, drains Pods, terminates, launches new nodes at 1.30
# With proper PodDisruptionBudgets set, this is zero-downtime
# Step 4: Verify all nodes are at the new version
kubectl get nodes -o wide
# All nodes should show VERSION: v1.30.x
$ aws eks update-cluster-version --name production --kubernetes-version 1.30
{
"update": {
"id": "abc-123-def",
"status": "InProgress",
"type": "VersionUpdate"
}
}
# 15 minutes later:
$ kubectl get nodes -o wide
NAME STATUS VERSION INSTANCE-TYPE
ip-10-0-1-12.compute.internal Ready v1.29.8 m5.xlarge ← not yet upgraded
ip-10-0-2-44.compute.internal Ready v1.29.8 m5.xlarge
# After node group upgrade:
$ kubectl get nodes -o wide
NAME STATUS VERSION INSTANCE-TYPE
ip-10-0-3-15.compute.internal Ready v1.30.2 m5.xlarge ← new nodes ✓
ip-10-0-4-22.compute.internal Ready v1.30.2 m5.xlarge$ aws eks update-cluster-version --name production --kubernetes-version 1.30
{
"update": {
"id": "abc-123-def",
"status": "InProgress",
"type": "VersionUpdate"
}
}
# ~15 minutes later -- control plane upgraded
$ kubectl version --short
Server Version: v1.30.2-eks-036c24b ✓
$ aws eks update-addon --cluster-name production --addon-name coredns --addon-version v1.11.1-eksbuild.4
{ "update": { "status": "InProgress" } }
$ aws eks update-nodegroup-version --cluster-name production --nodegroup-name app-nodes --kubernetes-version 1.30
{
"update": {
"id": "xyz-789-ghi",
"status": "InProgress",
"type": "VersionUpdate"
}
}
$ kubectl get nodes -o wide
NAME STATUS VERSION INSTANCE-TYPE
ip-10-0-3-15.compute.internal Ready v1.30.2 m5.xlarge ← new node ✓
ip-10-0-4-22.compute.internal Ready v1.30.2 m5.xlarge ✓
ip-10-0-5-99.compute.internal Ready v1.30.2 m5.xlarge ✓Upgrading Self-Managed Clusters (kubeadm)
For clusters managed with kubeadm, the control plane and each node must be upgraded manually. The order is always: upgrade kubeadm → upgrade control plane → upgrade kubelets/kubectl on control plane node → upgrade worker nodes one at a time.
# On the CONTROL PLANE node:
# Step 1: Upgrade kubeadm
apt-get update && apt-get install -y kubeadm=1.30.0-1.1
kubeadm version # verify: v1.30.0
# Step 2: Check what kubeadm will do
kubeadm upgrade plan
# Shows: current version, target version, add-on upgrades, API migrations
# Step 3: Apply the control plane upgrade
kubeadm upgrade apply v1.30.0
# Upgrades: kube-apiserver, kube-controller-manager, kube-scheduler, kube-proxy, CoreDNS, etcd
# Takes 3-5 minutes
# Step 4: Upgrade kubelet and kubectl on the control plane node
apt-get install -y kubelet=1.30.0-1.1 kubectl=1.30.0-1.1
systemctl daemon-reload && systemctl restart kubelet
# On each WORKER NODE (one at a time):
# Step 5: Cordon and drain the node
kubectl cordon worker-node-1 # Stop new Pods from scheduling here
kubectl drain worker-node-1 \
--ignore-daemonsets \ # DaemonSet Pods stay (they're managed by the DaemonSet)
--delete-emptydir-data # Allow draining Pods with emptyDir volumes
# Step 6: Upgrade kubeadm, kubelet, kubectl on the worker
ssh worker-node-1
apt-get install -y kubeadm=1.30.0-1.1
kubeadm upgrade node # Join config update only -- no control plane changes
apt-get install -y kubelet=1.30.0-1.1 kubectl=1.30.0-1.1
systemctl daemon-reload && systemctl restart kubelet
# Step 7: Uncordon -- return node to service
kubectl uncordon worker-node-1 # Pods can schedule here again
# Repeat Steps 5-7 for each remaining worker node
What just happened?
Cordon before drain — Cordoning marks the node unschedulable so no new Pods land on it while you're draining. Draining evicts all existing Pods gracefully (respecting PodDisruptionBudgets and terminationGracePeriodSeconds). Together they ensure the node is empty before you restart the kubelet. Skipping the cordon means new Pods may be scheduled onto the node mid-drain and then immediately evicted.
One worker at a time — Upgrading all workers simultaneously would drain all Pods at once, potentially violating PodDisruptionBudgets and causing outages. Upgrading one at a time keeps the cluster at near-full capacity throughout the process. With a 3-node cluster, always upgrade the node with the least critical Pods first.
$ kubeadm upgrade plan [upgrade/config] Making sure the configuration is correct: Components that must be upgraded manually: COMPONENT CURRENT TARGET kube-apiserver v1.29.8 v1.30.2 kube-scheduler v1.29.8 v1.30.2 controller-manager v1.29.8 v1.30.2 CoreDNS v1.11.1 v1.11.3 $ kubeadm upgrade apply v1.30.0 [upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.30.0". Enjoy! # On worker node -- after drain: $ kubectl get node worker-node-1 NAME STATUS VERSION worker-node-1 Ready,SchedulingDisabled v1.29.8 ← cordoned $ kubectl get node worker-node-1 NAME STATUS VERSION worker-node-1 Ready v1.30.2 ← upgraded and uncordoned ✓
Post-Upgrade Verification
# Verify all nodes are at the new version
kubectl get nodes -o wide
# All STATUS: Ready, VERSION: v1.30.x
# Verify all system Pods are running
kubectl get pods -n kube-system
# All should be Running or Completed (Jobs)
# Verify workloads survived the upgrade
kubectl get deployments -A
kubectl get statefulsets -A
# All AVAILABLE replicas match DESIRED
# Run your Helm tests to validate application behaviour
helm test payment-api -n payments --logs
helm test ingress-nginx -n ingress-nginx --logs
# Check for any newly deprecated API usage introduced by the upgrade
pluto detect-all-in-cluster --target-versions k8s=v1.31.0
# Start planning ahead for the NEXT upgrade
$ kubectl get nodes -o wide NAME STATUS VERSION INSTANCE-TYPE ip-10-0-3-15.compute.internal Ready v1.30.2 m5.xlarge ✓ ip-10-0-4-22.compute.internal Ready v1.30.2 m5.xlarge ✓ ip-10-0-5-99.compute.internal Ready v1.30.2 m5.xlarge ✓ $ kubectl get pods -n kube-system | grep -v Running # All system pods Running -- no restarts or errors ✓ $ helm test payment-api -n payments --logs TEST SUITE: payment-api-test-connection Phase: Succeeded ✓ TEST SUITE: payment-api-test-db Phase: Succeeded ✓ $ pluto detect-all-in-cluster --target-versions k8s=v1.31.0 No deprecated or removed APIs found -- ready for next upgrade cycle ✓
Teacher's Note: Making upgrades routine rather than stressful
Teams that skip upgrades for 18 months and then try to go from 1.24 to 1.30 in one go have a bad time. Three minor version jumps means three rounds of deprecated API removal, three rounds of add-on compatibility checks, and three rounds of node drains — in rapid succession with accumulated risk from each step.
The antidote is making upgrades boring: run them every 4-6 months, one minor version at a time, with a staging cluster that is always one version ahead of production. Every upgrade becomes a 2-hour scheduled window with a known runbook, not a 2-day emergency.
On EKS specifically: enable automatic control plane patch upgrades (patch versions, not minor) and set node group update config to automatically use the latest AMI for your minor version. This keeps you current on security patches without manual effort, saving the deliberate effort for minor version upgrades where API changes require attention.
Practice Questions
1. Kubernetes requires you to upgrade in what increments — you cannot skip from 1.27 directly to 1.30?
2. Before upgrading a kubeadm worker node, which command evicts all Pods from the node gracefully while respecting PodDisruptionBudgets?
3. Which open-source tool scans a live cluster and Helm chart manifests for deprecated or removed API versions in a target Kubernetes version?
Quiz
1. What is the required order for upgrading components in a self-managed kubeadm cluster?
2. Your Ingress manifests use apiVersion: extensions/v1beta1. Why does this matter before upgrading to Kubernetes 1.22?
3. When upgrading EKS from 1.29 to 1.30, what does the aws eks update-cluster-version command actually upgrade, and what do you need to do afterwards?
Up Next · Lesson 57
Backup and Restore
etcd is the brain of your cluster — if it is lost without a backup, so is every resource definition, Secret, and ConfigMap. This lesson covers etcd backups with etcdctl, Velero for application-level backup and restore, and disaster recovery runbooks for common failure scenarios.