Kubernetes Course
Vertical Pod Autoscaler
Setting CPU and memory requests is one of the hardest parts of deploying to Kubernetes — too low and your Pod gets OOMKilled or throttled, too high and you waste money and reduce cluster bin-packing efficiency. The Vertical Pod Autoscaler (VPA) observes real usage and recommends — or automatically applies — the right resource values.
HPA vs VPA: Different Dimensions
| HPA | VPA | |
|---|---|---|
| What it scales | Number of Pod replicas (out/in) | CPU and memory requests per Pod (up/down) |
| Metric | CPU/memory utilisation, custom metrics | Historical CPU/memory usage over time |
| Effect | More Pods handle more load | Each Pod gets the right-sized requests |
| Best for | Stateless, horizontally scalable workloads | StatefulSets, JVM apps, workloads hard to scale horizontally |
| Used together? | Yes — HPA scales replicas, VPA right-sizes each one | Caution: don't use both on CPU/memory simultaneously |
VPA Components and Update Modes
VPA has three components: the Recommender (monitors usage, computes recommendations), the Updater (evicts Pods that need updated requests), and the Admission Plugin (patches new Pods with recommended requests at creation time). The updateMode controls how aggressively VPA acts:
Off
Recommend only — never modify Pods. Read recommendations with kubectl describe vpa. Safe starting point for any workload.
Initial
Apply recommendations only to newly created Pods — not to running ones. Good for gradual rollout without unexpected restarts.
Auto
Evict and recreate Pods when requests need updating. Full automation but causes restarts — use with PodDisruptionBudget.
Recreate
Like Auto but uses delete-and-recreate instead of eviction. More disruptive — rarely preferred over Auto.
Creating a VPA
The scenario: Your Java payment processor's heap size is hard to predict — it varies with transaction volume and data size. You start VPA in Off mode to gather recommendations, then switch to Auto once you understand the pattern.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: payment-processor-vpa
namespace: payments
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: payment-processor # The Deployment to observe and resize
updatePolicy:
updateMode: "Off" # Start with Off — read recommendations first
# Change to "Auto" once you've validated the recommendations are sensible
resourcePolicy:
containerPolicies:
- containerName: payment-processor
minAllowed:
cpu: 100m # VPA won't recommend below these values
memory: 256Mi
maxAllowed:
cpu: 4 # VPA won't recommend above these values
memory: 8Gi
controlledResources:
- cpu
- memory
controlledValues: RequestsAndLimits # Update both requests and limits
# RequestsOnly: only update requests
$ kubectl apply -f payment-processor-vpa.yaml
verticalpodautoscaler.autoscaling.k8s.io/payment-processor-vpa created
# After running for several days — read the recommendations
$ kubectl describe vpa payment-processor-vpa -n payments
Name: payment-processor-vpa
Namespace: payments
Status:
Recommendation:
Container Recommendations:
Container Name: payment-processor
Lower Bound:
Cpu: 150m
Memory: 512Mi
Target: ← Apply these values
Cpu: 320m
Memory: 1200Mi
Uncapped Target:
Cpu: 320m
Memory: 1200Mi
Upper Bound:
Cpu: 1200m
Memory: 3Gi
# Current manifest has: cpu: 100m, memory: 256Mi ← significantly under-resourced!
# VPA recommends: cpu: 320m, memory: 1200Mi ← 3x CPU, 4.7x memoryWhat just happened?
VPA reveals right-sizing gaps — The recommendation shows the Java processor was running with 100m CPU and 256Mi memory — wildly under-resourced for a JVM application. The scheduler was placing it on nodes that technically had capacity, but the JVM was being throttled and GC pressure was causing latency spikes. VPA's target of 320m/1200Mi reflects actual steady-state usage plus headroom.
Lower, Target, and Upper bounds — Target is what VPA would set today based on observed usage. Lower Bound is the minimum safe value (5th percentile). Upper Bound accounts for usage spikes. In Auto mode, VPA sets the request to Target and only evicts the Pod if current requests fall outside the Lower–Upper range.
Teacher's Note: VPA best practices
Start with Off mode for 1–2 weeks — VPA needs time to observe usage patterns. A recommendation based on 30 minutes of data is not reliable. Run Off mode for at least a week, including through your peak traffic periods, before trusting the target values.
Don't use VPA and HPA on the same metric — If VPA increases CPU requests, the HPA's utilisation percentage changes without any actual load change, which can cause unexpected scaling behaviour. The safe combination: HPA on CPU/memory + VPA in Off mode for observability only, or HPA on custom metrics + VPA on CPU/memory for sizing.
VPA in Auto mode restarts Pods — When VPA needs to change a running Pod's requests, it evicts it. The new Pod starts with updated requests. This is a restart — ensure your Deployment has enough replicas and a PodDisruptionBudget so the eviction doesn't take your service offline. Never run VPA in Auto mode on a single-replica Deployment without careful consideration.
Practice Questions
1. Which VPA update mode produces recommendations but never modifies running Pods — making it safe to apply to any workload as a first step?
2. Which VPA mode automatically evicts and recreates Pods when their resource requests need updating — giving full automation but requiring a PodDisruptionBudget?
3. What is the key constraint when using both HPA and VPA on the same Deployment?
Quiz
1. What is the fundamental difference between VPA and HPA?
2. A Java application keeps getting OOMKilled but you don't know the right memory request to set. How does VPA help?
3. You enable VPA in Auto mode. How does VPA apply a new memory recommendation to a running Pod?
Up Next · Lesson 51
Cluster Autoscaling
HPA scales Pods — but if all nodes are full, new Pods go Pending. Cluster Autoscaler and Karpenter solve the node dimension, automatically provisioning and removing nodes to match workload demand and cut costs.