Vertical Pod Autoscaler

Setting CPU and memory requests is one of the hardest parts of deploying to Kubernetes — too low and your Pod gets OOMKilled or throttled, too high and you waste money and reduce cluster bin-packing efficiency. The Vertical Pod Autoscaler (VPA) observes real usage and recommends — or automatically applies — the right resource values.

HPA vs VPA: Different Dimensions

	HPA	VPA
What it scales	Number of Pod replicas (out/in)	CPU and memory requests per Pod (up/down)
Metric	CPU/memory utilisation, custom metrics	Historical CPU/memory usage over time
Effect	More Pods handle more load	Each Pod gets the right-sized requests
Best for	Stateless, horizontally scalable workloads	StatefulSets, JVM apps, workloads hard to scale horizontally
Used together?	Yes — HPA scales replicas, VPA right-sizes each one	Caution: don't use both on CPU/memory simultaneously

VPA Components and Update Modes

VPA has three components: the Recommender (monitors usage, computes recommendations), the Updater (evicts Pods that need updated requests), and the Admission Plugin (patches new Pods with recommended requests at creation time). The updateMode controls how aggressively VPA acts:

Off

Recommend only — never modify Pods. Read recommendations with kubectl describe vpa. Safe starting point for any workload.

Initial

Apply recommendations only to newly created Pods — not to running ones. Good for gradual rollout without unexpected restarts.

Auto

Evict and recreate Pods when requests need updating. Full automation but causes restarts — use with PodDisruptionBudget.

Recreate

Like Auto but uses delete-and-recreate instead of eviction. More disruptive — rarely preferred over Auto.

Creating a VPA

The scenario: Your Java payment processor's heap size is hard to predict — it varies with transaction volume and data size. You start VPA in Off mode to gather recommendations, then switch to Auto once you understand the pattern.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: payment-processor-vpa
  namespace: payments
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payment-processor          # The Deployment to observe and resize

  updatePolicy:
    updateMode: "Off"                # Start with Off — read recommendations first
    # Change to "Auto" once you've validated the recommendations are sensible

  resourcePolicy:
    containerPolicies:
      - containerName: payment-processor
        minAllowed:
          cpu: 100m                  # VPA won't recommend below these values
          memory: 256Mi
        maxAllowed:
          cpu: 4                     # VPA won't recommend above these values
          memory: 8Gi
        controlledResources:
          - cpu
          - memory
        controlledValues: RequestsAndLimits   # Update both requests and limits
                                              # RequestsOnly: only update requests

$ kubectl apply -f payment-processor-vpa.yaml
verticalpodautoscaler.autoscaling.k8s.io/payment-processor-vpa created

# After running for several days — read the recommendations
$ kubectl describe vpa payment-processor-vpa -n payments
Name:         payment-processor-vpa
Namespace:    payments
Status:
  Recommendation:
    Container Recommendations:
      Container Name:  payment-processor
      Lower Bound:
        Cpu:     150m
        Memory:  512Mi
      Target:                       ← Apply these values
        Cpu:     320m
        Memory:  1200Mi
      Uncapped Target:
        Cpu:     320m
        Memory:  1200Mi
      Upper Bound:
        Cpu:     1200m
        Memory:  3Gi

# Current manifest has: cpu: 100m, memory: 256Mi  ← significantly under-resourced!
# VPA recommends:        cpu: 320m, memory: 1200Mi ← 3x CPU, 4.7x memory

What just happened?

VPA reveals right-sizing gaps — The recommendation shows the Java processor was running with 100m CPU and 256Mi memory — wildly under-resourced for a JVM application. The scheduler was placing it on nodes that technically had capacity, but the JVM was being throttled and GC pressure was causing latency spikes. VPA's target of 320m/1200Mi reflects actual steady-state usage plus headroom.

Lower, Target, and Upper bounds — Target is what VPA would set today based on observed usage. Lower Bound is the minimum safe value (5th percentile). Upper Bound accounts for usage spikes. In Auto mode, VPA sets the request to Target and only evicts the Pod if current requests fall outside the Lower–Upper range.

Teacher's Note: VPA best practices

Start with Off mode for 1–2 weeks — VPA needs time to observe usage patterns. A recommendation based on 30 minutes of data is not reliable. Run Off mode for at least a week, including through your peak traffic periods, before trusting the target values.

Don't use VPA and HPA on the same metric — If VPA increases CPU requests, the HPA's utilisation percentage changes without any actual load change, which can cause unexpected scaling behaviour. The safe combination: HPA on CPU/memory + VPA in Off mode for observability only, or HPA on custom metrics + VPA on CPU/memory for sizing.

VPA in Auto mode restarts Pods — When VPA needs to change a running Pod's requests, it evicts it. The new Pod starts with updated requests. This is a restart — ensure your Deployment has enough replicas and a PodDisruptionBudget so the eviction doesn't take your service offline. Never run VPA in Auto mode on a single-replica Deployment without careful consideration.

Practice Questions

1. Which VPA update mode produces recommendations but never modifies running Pods — making it safe to apply to any workload as a first step?

2. Which VPA mode automatically evicts and recreates Pods when their resource requests need updating — giving full automation but requiring a PodDisruptionBudget?

3. What is the key constraint when using both HPA and VPA on the same Deployment?

Quiz

Up Next · Lesson 51

Cluster Autoscaling

HPA scales Pods — but if all nodes are full, new Pods go Pending. Cluster Autoscaler and Karpenter solve the node dimension, automatically provisioning and removing nodes to match workload demand and cut costs.

← Previous Course Index Next →

Kubernetes Course