Kubernetes Course
Persistent Volumes
Running a database in Kubernetes without understanding Persistent Volumes is how teams lose data. This lesson covers the PV/PVC model — the abstraction layer that lets developers claim durable storage without knowing whether it's backed by an AWS EBS disk, a GCP persistent disk, NFS, or bare metal SAN — and why that separation of concerns matters at scale.
The Storage Abstraction Problem
In a typical organisation, there's a hard boundary between people who understand storage infrastructure (storage admins, cloud architects) and people who write applications (developers, DevOps engineers). The storage team provisions EBS volumes, NFS shares, or Ceph clusters. The developer just needs "give me 50GB of disk that survives Pod restarts."
Kubernetes solves this with a two-object model. A PersistentVolume (PV) represents a piece of actual storage infrastructure — provisioned by an admin or automatically by a StorageClass. A PersistentVolumeClaim (PVC) is a developer's request for storage — "I need 50GB, ReadWriteOnce." Kubernetes binds PVCs to PVs that satisfy the request. The developer never needs to know what's underneath.
The analogy that makes this click
Think of PVs like hotel rooms and PVCs like room reservations. The hotel (Kubernetes) has rooms of different sizes and types (PVs). You make a reservation specifying what you need — "a non-smoking double for two nights" (PVC). You don't pick which physical room you get. The hotel assigns a room that matches your request. Once assigned, that room is yours alone until checkout (Pod deletion or PVC release).
Key PV Concepts Before the YAML
Three concepts govern how PVs work: access modes, reclaim policy, and binding. Understanding all three before writing YAML prevents the most common persistent storage mistakes.
| Access Mode | Meaning | Typical use case |
|---|---|---|
| ReadWriteOnce (RWO) | One node can mount the volume read-write. Multiple Pods on the same node can use it. | Databases, single-replica stateful apps. Most cloud block storage (EBS, PD) is RWO-only. |
| ReadOnlyMany (ROX) | Many nodes can mount the volume, but read-only only. | Distributing static assets, configuration, or reference data to many Pods. |
| ReadWriteMany (RWX) | Many nodes can mount the volume read-write simultaneously. | Shared file storage. Requires network storage (NFS, CephFS, Azure Files). Block storage cannot do RWX. |
| ReadWriteOncePod (RWOP) | Exactly one Pod (not node) can mount read-write. Introduced in Kubernetes 1.22. | Stricter single-writer guarantees than RWO — prevents two Pods on the same node from accidentally sharing a volume. |
| Reclaim Policy | What happens when the PVC is deleted | Use when |
|---|---|---|
| Delete | PV and the underlying storage resource (e.g. the EBS volume) are automatically deleted. | Dynamically provisioned volumes in dev/staging — you want automatic cleanup. Dangerous for production databases. |
| Retain | PV moves to Released state. Data is preserved. Admin must manually reclaim or delete. |
Production databases. The most important data must never be deleted automatically. |
| Recycle | Performs a basic scrub (rm -rf /volume/*) and makes PV available again. Deprecated. | Don't use — deprecated in Kubernetes 1.11, removed in 1.25. Use dynamic provisioning instead. |
Creating a PersistentVolume Manually
In production clusters, most PVs are created automatically by a StorageClass (covered at the end of this lesson). But understanding a manually created PV first builds the mental model for everything that follows. This is also how on-premises clusters work — a storage admin pre-provisions PVs from SAN/NFS, and developers claim them via PVCs.
The scenario: You're a storage administrator at a company running Kubernetes on bare metal. The platform team has provisioned a 100GB NFS share at nfs.storage.internal:/exports/postgres-data for the production PostgreSQL database. You need to register this as a PersistentVolume so the database team can claim it.
apiVersion: v1
kind: PersistentVolume # PV: a piece of storage in the cluster
metadata:
name: postgres-pv-prod # Unique name for this PV — cluster-scoped (no namespace)
labels:
type: nfs # Labels on PVs can be used by PVC selectors to target specific PVs
environment: production
team: database
spec:
storageClassName: manual # storageClassName: "manual" means no dynamic provisioner manages this
# PVCs must explicitly request storageClassName: manual to bind here
capacity:
storage: 100Gi # Total storage capacity of this PV
accessModes:
- ReadWriteOnce # Only one node can mount this read-write at a time
# NFS actually supports RWX, but Postgres only needs one writer
persistentVolumeReclaimPolicy: Retain # RETAIN: when PVC is deleted, data is preserved — admin must clean up
# NEVER use Delete for production database volumes
nfs: # nfs: volume type — an NFS network filesystem share
server: nfs.storage.internal # Hostname or IP of the NFS server
path: /exports/postgres-data # Export path on the NFS server
readOnly: false # Allow writes
mountOptions: # mountOptions: NFS mount options passed to the kernel
- hard # hard: retry NFS operations indefinitely (vs soft which gives up)
- nfsvers=4.1 # NFS version — 4.1 has better performance and security than v3
- timeo=600 # Timeout: 600 tenths of a second (60s) before retrying
volumeMode: Filesystem # Filesystem (default): volume is formatted and mounted as a directory
# Block: raw block device — for databases that manage their own I/O
$ kubectl apply -f postgres-pv.yaml
persistentvolume/postgres-pv-prod created
$ kubectl get pv postgres-pv-prod
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS AGE
postgres-pv-prod 100Gi RWO Retain Available manual 5s
$ kubectl describe pv postgres-pv-prod
Name: postgres-pv-prod
Labels: environment=production, team=database, type=nfs
Status: Available
Reclaim Policy: Retain
Access Modes: RWO
Capacity: 100Gi
Node Affinity: <none>
StorageClass: manual
Source:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: nfs.storage.internal
Path: /exports/postgres-data
ReadOnly: falseWhat just happened?
STATUS: Available — A newly created PV starts in Available state. It's ready to be claimed but not yet bound to any PVC. The PV lifecycle has four states: Available (ready to claim), Bound (claimed by a PVC), Released (PVC deleted, data preserved, not yet reclaimed), Failed (automatic reclamation failed).
PVs are cluster-scoped — Notice there's no namespace field in the PV metadata. PVs are cluster-wide objects — they don't belong to any namespace. A PVC in the payments namespace can bind to a PV that a storage admin created with no namespace. PVCs themselves are namespace-scoped though.
CLAIM column is empty — This confirms no PVC has claimed this PV yet. Once a PVC binds to it, this column shows namespace/pvc-name. A PV can only be bound to one PVC at a time.
Dynamic Provisioning with StorageClasses
Manually creating PVs doesn't scale. In cloud environments and modern on-premises setups, a StorageClass automates PV creation. When a PVC requests a StorageClass, the provisioner (a controller specific to the storage backend) automatically creates a PV of the right size and binds it. The storage admin defines the StorageClass once; developers get self-service storage forever after.
The scenario: Your team runs on AWS EKS. Rather than manually creating EBS volumes and registering them as PVs, you define a StorageClass that uses the AWS EBS CSI driver. Any PVC that references this StorageClass gets an EBS volume provisioned automatically — the right size, the right type, right now.
apiVersion: storage.k8s.io/v1
kind: StorageClass # StorageClass: defines a class of storage with a provisioner
metadata:
name: aws-ebs-gp3 # Name used by PVCs to request this type of storage
annotations:
storageclass.kubernetes.io/is-default-class: "true" # Mark as default — PVCs without storageClassName
# get this class automatically
provisioner: ebs.csi.aws.com # The CSI driver that handles volume creation
# For GCP: pd.csi.storage.gke.io
# For Azure: disk.csi.azure.com
# For on-prem NFS: nfs.csi.k8s.io (community)
parameters: # parameters: driver-specific options
type: gp3 # EBS volume type: gp3 (latest generation, best price/performance)
iops: "3000" # Provisioned IOPS for gp3 (baseline is 3000)
throughput: "125" # MB/s throughput for gp3 (baseline is 125)
encrypted: "true" # Encrypt the EBS volume with AWS KMS
reclaimPolicy: Delete # Default for dynamically provisioned volumes: delete EBS when PVC deleted
# Override to Retain for production databases via PV annotation
volumeBindingMode: WaitForFirstConsumer # Don't provision until a Pod claims the PVC
# Ensures the EBS volume is created in the SAME AZ as the Pod
# If Immediate: volume created in a random AZ — Pod may not schedule there
allowVolumeExpansion: true # Allow PVCs to be resized after creation — add storage without downtime
$ kubectl apply -f storageclass-ebs-gp3.yaml storageclass.storage.k8s.io/aws-ebs-gp3 created $ kubectl get storageclass NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE aws-ebs-gp3 (default) ebs.csi.aws.com Delete WaitForFirstConsumer true 4s standard kubernetes.io/no-provisioner Delete Immediate false 14d
What just happened?
WaitForFirstConsumer is critical for cloud deployments — This setting delays EBS volume creation until a Pod actually claims the PVC and gets scheduled. Without it (Immediate mode), the EBS volume is created in a random availability zone as soon as the PVC is applied. The Pod might then be scheduled into a different AZ — where the EBS volume doesn't exist. The Pod gets stuck in Pending because the volume is in the wrong zone. This is one of the most common "why won't my Pod schedule?" mysteries in EKS environments.
allowVolumeExpansion: true — This is what lets you increase a PVC's storage request after creation. Without it, PVCs are immutable in size — you'd need to create a new PVC, migrate data, and update the application. With it, you edit the PVC's spec.resources.requests.storage, and the CSI driver expands the underlying volume online for supported backends (EBS, GCP PD, Azure Disk).
is-default-class annotation — Marking a StorageClass as default means any PVC that doesn't specify a storageClassName will automatically use this class. In a cluster with one obvious production storage class, this saves developers from having to specify it on every PVC. Only one StorageClass should be marked as default at a time.
The PV/PVC/StorageClass Architecture
Here's how all three objects interact — covering both the static (manually provisioned) and dynamic (StorageClass-provisioned) paths:
PV Provisioning Paths — Static vs Dynamic
100Gi, RWO, NFS, Retain
requests 50Gi, RWO
aws-ebs-gp3, ebs.csi.aws.com
50Gi, storageClass: aws-ebs-gp3
created and bound automatically
PV with Node Affinity for Local Storage
For high-performance workloads (ML training, analytics, some databases), NVMe SSDs on the local node are far faster than network-attached storage. Kubernetes supports local PersistentVolumes that pin storage to a specific node. The trade-off is that Pods using local PVs must always schedule to that specific node — which is fine for StatefulSets but needs careful planning.
The scenario: Your ML training cluster has nodes with locally attached NVMe SSDs at /mnt/nvme0. You want a training job to use the local disk for its dataset cache — network storage would be the throughput bottleneck. You create a local PV with node affinity so Kubernetes always schedules the consuming Pod to the right node.
apiVersion: v1
kind: PersistentVolume
metadata:
name: nvme-pv-node-ml-01 # Name tied to the specific node — include the node name for clarity
spec:
capacity:
storage: 500Gi # Full NVMe disk capacity
volumeMode: Filesystem
accessModes:
- ReadWriteOnce # Local storage is always RWO — one node, one writer
persistentVolumeReclaimPolicy: Delete # Training data is reproducible — safe to delete on PVC removal
storageClassName: local-nvme # Custom StorageClass name — no provisioner, manual management
local: # local: a local storage device on the node
path: /mnt/nvme0 # The mount path of the NVMe device on the host
nodeAffinity: # nodeAffinity: REQUIRED for local PVs — ties PV to a specific node
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname # Match on the node's hostname label
operator: In
values:
- node-ml-01 # Only the node named node-ml-01 can use this PV
# Kubernetes scheduler MUST place the consuming Pod on this node
# If node-ml-01 is unavailable, the Pod stays Pending
$ kubectl apply -f nvme-pv.yaml
persistentvolume/nvme-pv-node-ml-01 created
$ kubectl get pv nvme-pv-node-ml-01
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS AGE
nvme-pv-node-ml-01 500Gi RWO Delete Available local-nvme 6s
$ kubectl describe pv nvme-pv-node-ml-01 | grep -A8 "Node Affinity:"
Node Affinity:
Required Terms:
Term 0: kubernetes.io/hostname in [node-ml-01]What just happened?
nodeAffinity is mandatory for local PVs — Without node affinity, Kubernetes doesn't know which node this PV's storage physically lives on. A Pod could be scheduled to any node, mount the "local" PV, and get pointed at a path that doesn't exist on that node. The nodeAffinity field tells the scheduler: "any Pod that claims this PV must run on node-ml-01." This is how the scheduler enforces data locality.
Local PVs + StatefulSets — Local PVs are most commonly used with StatefulSets (Lesson 30), where each StatefulSet replica gets its own PVC that binds to a specific node's local disk. The StatefulSet maintains stable Pod-to-storage associations even across restarts. This is how you run a distributed database like Cassandra or ClickHouse on Kubernetes with local NVMe storage.
Inspecting and Managing PVs
The scenario: You're an SRE doing a quarterly storage audit. A PV is showing as Released — its PVC was deleted but the PV has the Retain policy so the data wasn't auto-deleted. You need to investigate, determine if the data is still needed, and either reclaim the PV for reuse or clean it up.
kubectl get pv
# List all PVs in the cluster — cluster-scoped, no -n flag needed
# STATUS column: Available / Bound / Released / Failed
# CLAIM column: shows namespace/pvc-name for Bound PVs
kubectl get pv -o wide
# -o wide: adds STORAGECLASS, REASON, and VOLUMEMODE columns
# Useful for a full inventory of all storage in the cluster
kubectl describe pv postgres-pv-prod
# Full detail: source type, NFS server/path, node affinity, events
# Check the Message field if STATUS is Failed
kubectl get pv --sort-by=.spec.capacity.storage
# Sort PVs by size — useful for finding the largest volumes
# Good for cost audits: which PVs are consuming the most storage?
kubectl patch pv postgres-pv-prod \
-p '{"spec":{"claimRef": null}}'
# claimRef: null: manually release a Released PV so it becomes Available again
# WARNING: only do this after you've confirmed the data is backed up or no longer needed
# A Released PV still holds a reference to the old PVC's claimRef — clearing it
# makes the PV Available for a new PVC to claim
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS AGE
postgres-pv-prod 100Gi RWO Retain Released production/postgres-pvc manual 45d
nvme-pv-node-ml-01 500Gi RWO Delete Available local-nvme 2h
pvc-a4f9b2d1-... 50Gi RWO Delete Bound payments/postgres-payments-pvc aws-ebs-gp3 14d
$ kubectl describe pv postgres-pv-prod | grep -E "Status:|Claim:|Message:"
Status: Released
Claim: production/postgres-pvc
Message:
$ kubectl patch pv postgres-pv-prod -p '{"spec":{"claimRef": null}}'
persistentvolume/postgres-pv-prod patched
$ kubectl get pv postgres-pv-prod
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS AGE
postgres-pv-prod 100Gi RWO Retain Available manual 45dWhat just happened?
Released state explained — When a PVC is deleted and the PV's reclaimPolicy is Retain, the PV moves to Released. The data on the underlying storage is intact. But the PV carries a claimRef pointing to the now-deleted PVC — this prevents the PV from being claimed by a new PVC automatically. This is intentional: it forces a human to verify the data situation before the storage is reused.
Clearing claimRef — Patching claimRef: null removes the old PVC reference and moves the PV back to Available. Important caveat: the old data is still on the underlying NFS share. If a new PVC binds to this PV, the new Pod will see the previous tenant's data. Always wipe the data on the underlying storage before recycling a Retain PV for a different workload.
pvc-a4f9b2d1-... naming — The dynamically provisioned PV has a generated name starting with pvc- followed by the PVC's UID. This is the naming pattern for all dynamically provisioned PVs — you never set the PV name yourself when using a StorageClass, Kubernetes generates it.
Teacher's Note: The reclaim policy that will eventually cost you data
The most expensive lesson I've seen teams learn: a dynamically provisioned StorageClass with reclaimPolicy: Delete (the default) will delete your EBS/GCP-PD/Azure Disk the moment the PVC is deleted. Not when the Pod is deleted. Not when the Deployment is deleted. When the PVC is deleted. And PVCs can be deleted by accident — a careless kubectl delete namespace, a Helm uninstall, or a script that cleans up "unused" resources.
For production databases, always either: (1) use a StorageClass with reclaimPolicy: Retain, or (2) change the PV's reclaim policy to Retain after it's been created: kubectl patch pv [name] -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'. This is one of those "configure it once when you set up the cluster, never think about it again — until the day you're glad you did" settings.
One more: enable volume snapshots in your StorageClass policy if your CSI driver supports it. EBS, GCP PD, and Azure Disk all support CSI snapshots — scheduled snapshots to S3/GCS/Blob are your last line of defence when everything else fails.
Practice Questions
1. Which PersistentVolume reclaim policy should you always use for production databases — preserving the data on the underlying storage when a PVC is deleted, requiring a human to manually reclaim it?
2. Which StorageClass volumeBindingMode delays EBS volume creation until a Pod is actually scheduled — preventing the volume from being created in the wrong availability zone?
3. A PVC is deleted. The bound PV has reclaimPolicy: Retain. What STATUS does the PV show, and can a new PVC immediately bind to it?
Quiz
1. You need multiple Pods on different nodes to simultaneously write to the same PersistentVolume. Which access mode is required, and why can't AWS EBS be used for it?
2. A storage admin creates a PersistentVolume. Which namespace does it belong to?
3. A dynamically provisioned PV uses a StorageClass with reclaimPolicy: Delete. The PVC bound to it is accidentally deleted. What happens?
Up Next · Lesson 29
Persistent Volume Claims
How developers actually request and use persistent storage — writing PVCs, mounting them into Pods, resizing them live, and the binding rules that determine which PV a PVC gets.