Kubernetes Lesson 28 – Persistent Volumes | Dataplexa

Core Kubernetes Concepts · Lesson 28

Persistent Volumes

Running a database in Kubernetes without understanding Persistent Volumes is how teams lose data. This lesson covers the PV/PVC model — the abstraction layer that lets developers claim durable storage without knowing whether it's backed by an AWS EBS disk, a GCP persistent disk, NFS, or bare metal SAN — and why that separation of concerns matters at scale.

The Storage Abstraction Problem

In a typical organisation, there's a hard boundary between people who understand storage infrastructure (storage admins, cloud architects) and people who write applications (developers, DevOps engineers). The storage team provisions EBS volumes, NFS shares, or Ceph clusters. The developer just needs "give me 50GB of disk that survives Pod restarts."

Kubernetes solves this with a two-object model. A PersistentVolume (PV) represents a piece of actual storage infrastructure — provisioned by an admin or automatically by a StorageClass. A PersistentVolumeClaim (PVC) is a developer's request for storage — "I need 50GB, ReadWriteOnce." Kubernetes binds PVCs to PVs that satisfy the request. The developer never needs to know what's underneath.

The analogy that makes this click

Think of PVs like hotel rooms and PVCs like room reservations. The hotel (Kubernetes) has rooms of different sizes and types (PVs). You make a reservation specifying what you need — "a non-smoking double for two nights" (PVC). You don't pick which physical room you get. The hotel assigns a room that matches your request. Once assigned, that room is yours alone until checkout (Pod deletion or PVC release).

Key PV Concepts Before the YAML

Three concepts govern how PVs work: access modes, reclaim policy, and binding. Understanding all three before writing YAML prevents the most common persistent storage mistakes.

Access Mode	Meaning	Typical use case
ReadWriteOnce (RWO)	One node can mount the volume read-write. Multiple Pods on the same node can use it.	Databases, single-replica stateful apps. Most cloud block storage (EBS, PD) is RWO-only.
ReadOnlyMany (ROX)	Many nodes can mount the volume, but read-only only.	Distributing static assets, configuration, or reference data to many Pods.
ReadWriteMany (RWX)	Many nodes can mount the volume read-write simultaneously.	Shared file storage. Requires network storage (NFS, CephFS, Azure Files). Block storage cannot do RWX.
ReadWriteOncePod (RWOP)	Exactly one Pod (not node) can mount read-write. Introduced in Kubernetes 1.22.	Stricter single-writer guarantees than RWO — prevents two Pods on the same node from accidentally sharing a volume.

Reclaim Policy	What happens when the PVC is deleted	Use when
Delete	PV and the underlying storage resource (e.g. the EBS volume) are automatically deleted.	Dynamically provisioned volumes in dev/staging — you want automatic cleanup. Dangerous for production databases.
Retain	PV moves to `Released` state. Data is preserved. Admin must manually reclaim or delete.	Production databases. The most important data must never be deleted automatically.
Recycle	Performs a basic scrub (rm -rf /volume/) and makes PV available again. Deprecated.*	Don't use — deprecated in Kubernetes 1.11, removed in 1.25. Use dynamic provisioning instead.

Creating a PersistentVolume Manually

In production clusters, most PVs are created automatically by a StorageClass (covered at the end of this lesson). But understanding a manually created PV first builds the mental model for everything that follows. This is also how on-premises clusters work — a storage admin pre-provisions PVs from SAN/NFS, and developers claim them via PVCs.

The scenario: You're a storage administrator at a company running Kubernetes on bare metal. The platform team has provisioned a 100GB NFS share at nfs.storage.internal:/exports/postgres-data for the production PostgreSQL database. You need to register this as a PersistentVolume so the database team can claim it.

apiVersion: v1
kind: PersistentVolume             # PV: a piece of storage in the cluster
metadata:
  name: postgres-pv-prod           # Unique name for this PV — cluster-scoped (no namespace)
  labels:
    type: nfs                      # Labels on PVs can be used by PVC selectors to target specific PVs
    environment: production
    team: database
spec:
  storageClassName: manual         # storageClassName: "manual" means no dynamic provisioner manages this
                                   # PVCs must explicitly request storageClassName: manual to bind here
  capacity:
    storage: 100Gi                 # Total storage capacity of this PV
  accessModes:
    - ReadWriteOnce                # Only one node can mount this read-write at a time
                                   # NFS actually supports RWX, but Postgres only needs one writer
  persistentVolumeReclaimPolicy: Retain  # RETAIN: when PVC is deleted, data is preserved — admin must clean up
                                          # NEVER use Delete for production database volumes
  nfs:                             # nfs: volume type — an NFS network filesystem share
    server: nfs.storage.internal   # Hostname or IP of the NFS server
    path: /exports/postgres-data   # Export path on the NFS server
    readOnly: false                # Allow writes
  mountOptions:                    # mountOptions: NFS mount options passed to the kernel
    - hard                         # hard: retry NFS operations indefinitely (vs soft which gives up)
    - nfsvers=4.1                  # NFS version — 4.1 has better performance and security than v3
    - timeo=600                    # Timeout: 600 tenths of a second (60s) before retrying
  volumeMode: Filesystem           # Filesystem (default): volume is formatted and mounted as a directory
                                   # Block: raw block device — for databases that manage their own I/O

$ kubectl apply -f postgres-pv.yaml
persistentvolume/postgres-pv-prod created

$ kubectl get pv postgres-pv-prod
NAME               CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   AGE
postgres-pv-prod   100Gi      RWO            Retain           Available           manual         5s

$ kubectl describe pv postgres-pv-prod
Name:            postgres-pv-prod
Labels:          environment=production, team=database, type=nfs
Status:          Available
Reclaim Policy:  Retain
Access Modes:    RWO
Capacity:        100Gi
Node Affinity:   <none>
StorageClass:    manual
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    nfs.storage.internal
    Path:      /exports/postgres-data
    ReadOnly:  false

What just happened?

STATUS: Available — A newly created PV starts in Available state. It's ready to be claimed but not yet bound to any PVC. The PV lifecycle has four states: Available (ready to claim), Bound (claimed by a PVC), Released (PVC deleted, data preserved, not yet reclaimed), Failed (automatic reclamation failed).

PVs are cluster-scoped — Notice there's no namespace field in the PV metadata. PVs are cluster-wide objects — they don't belong to any namespace. A PVC in the payments namespace can bind to a PV that a storage admin created with no namespace. PVCs themselves are namespace-scoped though.

CLAIM column is empty — This confirms no PVC has claimed this PV yet. Once a PVC binds to it, this column shows namespace/pvc-name. A PV can only be bound to one PVC at a time.

Dynamic Provisioning with StorageClasses

Manually creating PVs doesn't scale. In cloud environments and modern on-premises setups, a StorageClass automates PV creation. When a PVC requests a StorageClass, the provisioner (a controller specific to the storage backend) automatically creates a PV of the right size and binds it. The storage admin defines the StorageClass once; developers get self-service storage forever after.

The scenario: Your team runs on AWS EKS. Rather than manually creating EBS volumes and registering them as PVs, you define a StorageClass that uses the AWS EBS CSI driver. Any PVC that references this StorageClass gets an EBS volume provisioned automatically — the right size, the right type, right now.

apiVersion: storage.k8s.io/v1
kind: StorageClass                    # StorageClass: defines a class of storage with a provisioner
metadata:
  name: aws-ebs-gp3                   # Name used by PVCs to request this type of storage
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"  # Mark as default — PVCs without storageClassName
                                                          # get this class automatically
provisioner: ebs.csi.aws.com          # The CSI driver that handles volume creation
                                      # For GCP: pd.csi.storage.gke.io
                                      # For Azure: disk.csi.azure.com
                                      # For on-prem NFS: nfs.csi.k8s.io (community)
parameters:                           # parameters: driver-specific options
  type: gp3                           # EBS volume type: gp3 (latest generation, best price/performance)
  iops: "3000"                        # Provisioned IOPS for gp3 (baseline is 3000)
  throughput: "125"                   # MB/s throughput for gp3 (baseline is 125)
  encrypted: "true"                   # Encrypt the EBS volume with AWS KMS
reclaimPolicy: Delete                 # Default for dynamically provisioned volumes: delete EBS when PVC deleted
                                      # Override to Retain for production databases via PV annotation
volumeBindingMode: WaitForFirstConsumer  # Don't provision until a Pod claims the PVC
                                         # Ensures the EBS volume is created in the SAME AZ as the Pod
                                         # If Immediate: volume created in a random AZ — Pod may not schedule there
allowVolumeExpansion: true            # Allow PVCs to be resized after creation — add storage without downtime

$ kubectl apply -f storageclass-ebs-gp3.yaml
storageclass.storage.k8s.io/aws-ebs-gp3 created

$ kubectl get storageclass
NAME                    PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
aws-ebs-gp3 (default)   ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   4s
standard                kubernetes.io/no-provisioner  Delete    Immediate              false                  14d

What just happened?

WaitForFirstConsumer is critical for cloud deployments — This setting delays EBS volume creation until a Pod actually claims the PVC and gets scheduled. Without it (Immediate mode), the EBS volume is created in a random availability zone as soon as the PVC is applied. The Pod might then be scheduled into a different AZ — where the EBS volume doesn't exist. The Pod gets stuck in Pending because the volume is in the wrong zone. This is one of the most common "why won't my Pod schedule?" mysteries in EKS environments.

allowVolumeExpansion: true — This is what lets you increase a PVC's storage request after creation. Without it, PVCs are immutable in size — you'd need to create a new PVC, migrate data, and update the application. With it, you edit the PVC's spec.resources.requests.storage, and the CSI driver expands the underlying volume online for supported backends (EBS, GCP PD, Azure Disk).

is-default-class annotation — Marking a StorageClass as default means any PVC that doesn't specify a storageClassName will automatically use this class. In a cluster with one obvious production storage class, this saves developers from having to specify it on every PVC. Only one StorageClass should be marked as default at a time.

The PV/PVC/StorageClass Architecture

Here's how all three objects interact — covering both the static (manually provisioned) and dynamic (StorageClass-provisioned) paths:

PV Provisioning Paths — Static vs Dynamic

Static Provisioning (on-premises / manual)

👩‍💼 Storage Admin

↓ creates

PersistentVolume (PV)
100Gi, RWO, NFS, Retain

↕ Kubernetes binds

PersistentVolumeClaim (PVC)
requests 50Gi, RWO

↑ creates

👨‍💻 Developer

Dynamic Provisioning (cloud / StorageClass)

👩‍💼 Platform Admin

↓ creates once

StorageClass
aws-ebs-gp3, ebs.csi.aws.com

↓ PVC triggers provisioner

PVC requests storage
50Gi, storageClass: aws-ebs-gp3

↓ auto-created

PV + EBS Volume
created and bound automatically

PV with Node Affinity for Local Storage

For high-performance workloads (ML training, analytics, some databases), NVMe SSDs on the local node are far faster than network-attached storage. Kubernetes supports local PersistentVolumes that pin storage to a specific node. The trade-off is that Pods using local PVs must always schedule to that specific node — which is fine for StatefulSets but needs careful planning.

The scenario: Your ML training cluster has nodes with locally attached NVMe SSDs at /mnt/nvme0. You want a training job to use the local disk for its dataset cache — network storage would be the throughput bottleneck. You create a local PV with node affinity so Kubernetes always schedules the consuming Pod to the right node.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nvme-pv-node-ml-01          # Name tied to the specific node — include the node name for clarity
spec:
  capacity:
    storage: 500Gi                  # Full NVMe disk capacity
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce                 # Local storage is always RWO — one node, one writer
  persistentVolumeReclaimPolicy: Delete  # Training data is reproducible — safe to delete on PVC removal
  storageClassName: local-nvme      # Custom StorageClass name — no provisioner, manual management
  local:                            # local: a local storage device on the node
    path: /mnt/nvme0                # The mount path of the NVMe device on the host
  nodeAffinity:                     # nodeAffinity: REQUIRED for local PVs — ties PV to a specific node
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname   # Match on the node's hostname label
              operator: In
              values:
                - node-ml-01        # Only the node named node-ml-01 can use this PV
                                    # Kubernetes scheduler MUST place the consuming Pod on this node
                                    # If node-ml-01 is unavailable, the Pod stays Pending

$ kubectl apply -f nvme-pv.yaml
persistentvolume/nvme-pv-node-ml-01 created

$ kubectl get pv nvme-pv-node-ml-01
NAME                   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   AGE
nvme-pv-node-ml-01     500Gi      RWO            Delete           Available           local-nvme     6s

$ kubectl describe pv nvme-pv-node-ml-01 | grep -A8 "Node Affinity:"
Node Affinity:
  Required Terms:
    Term 0:        kubernetes.io/hostname in [node-ml-01]

What just happened?

nodeAffinity is mandatory for local PVs — Without node affinity, Kubernetes doesn't know which node this PV's storage physically lives on. A Pod could be scheduled to any node, mount the "local" PV, and get pointed at a path that doesn't exist on that node. The nodeAffinity field tells the scheduler: "any Pod that claims this PV must run on node-ml-01." This is how the scheduler enforces data locality.

Local PVs + StatefulSets — Local PVs are most commonly used with StatefulSets (Lesson 30), where each StatefulSet replica gets its own PVC that binds to a specific node's local disk. The StatefulSet maintains stable Pod-to-storage associations even across restarts. This is how you run a distributed database like Cassandra or ClickHouse on Kubernetes with local NVMe storage.

Inspecting and Managing PVs

The scenario: You're an SRE doing a quarterly storage audit. A PV is showing as Released — its PVC was deleted but the PV has the Retain policy so the data wasn't auto-deleted. You need to investigate, determine if the data is still needed, and either reclaim the PV for reuse or clean it up.

kubectl get pv
# List all PVs in the cluster — cluster-scoped, no -n flag needed
# STATUS column: Available / Bound / Released / Failed
# CLAIM column: shows namespace/pvc-name for Bound PVs

kubectl get pv -o wide
# -o wide: adds STORAGECLASS, REASON, and VOLUMEMODE columns
# Useful for a full inventory of all storage in the cluster

kubectl describe pv postgres-pv-prod
# Full detail: source type, NFS server/path, node affinity, events
# Check the Message field if STATUS is Failed

kubectl get pv --sort-by=.spec.capacity.storage
# Sort PVs by size — useful for finding the largest volumes
# Good for cost audits: which PVs are consuming the most storage?

kubectl patch pv postgres-pv-prod \
  -p '{"spec":{"claimRef": null}}'
# claimRef: null: manually release a Released PV so it becomes Available again
# WARNING: only do this after you've confirmed the data is backed up or no longer needed
# A Released PV still holds a reference to the old PVC's claimRef — clearing it
# makes the PV Available for a new PVC to claim

$ kubectl get pv
NAME                   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                           STORAGECLASS   AGE
postgres-pv-prod       100Gi      RWO            Retain           Released    production/postgres-pvc         manual         45d
nvme-pv-node-ml-01     500Gi      RWO            Delete           Available                                   local-nvme     2h
pvc-a4f9b2d1-...       50Gi       RWO            Delete           Bound       payments/postgres-payments-pvc  aws-ebs-gp3    14d

$ kubectl describe pv postgres-pv-prod | grep -E "Status:|Claim:|Message:"
Status:          Released
Claim:           production/postgres-pvc
Message:

$ kubectl patch pv postgres-pv-prod -p '{"spec":{"claimRef": null}}'
persistentvolume/postgres-pv-prod patched

$ kubectl get pv postgres-pv-prod
NAME               CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   AGE
postgres-pv-prod   100Gi      RWO            Retain           Available           manual         45d

What just happened?

Released state explained — When a PVC is deleted and the PV's reclaimPolicy is Retain, the PV moves to Released. The data on the underlying storage is intact. But the PV carries a claimRef pointing to the now-deleted PVC — this prevents the PV from being claimed by a new PVC automatically. This is intentional: it forces a human to verify the data situation before the storage is reused.

Clearing claimRef — Patching claimRef: null removes the old PVC reference and moves the PV back to Available. Important caveat: the old data is still on the underlying NFS share. If a new PVC binds to this PV, the new Pod will see the previous tenant's data. Always wipe the data on the underlying storage before recycling a Retain PV for a different workload.

pvc-a4f9b2d1-... naming — The dynamically provisioned PV has a generated name starting with pvc- followed by the PVC's UID. This is the naming pattern for all dynamically provisioned PVs — you never set the PV name yourself when using a StorageClass, Kubernetes generates it.

Teacher's Note: The reclaim policy that will eventually cost you data

The most expensive lesson I've seen teams learn: a dynamically provisioned StorageClass with reclaimPolicy: Delete (the default) will delete your EBS/GCP-PD/Azure Disk the moment the PVC is deleted. Not when the Pod is deleted. Not when the Deployment is deleted. When the PVC is deleted. And PVCs can be deleted by accident — a careless kubectl delete namespace, a Helm uninstall, or a script that cleans up "unused" resources.

For production databases, always either: (1) use a StorageClass with reclaimPolicy: Retain, or (2) change the PV's reclaim policy to Retain after it's been created: kubectl patch pv [name] -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'. This is one of those "configure it once when you set up the cluster, never think about it again — until the day you're glad you did" settings.

One more: enable volume snapshots in your StorageClass policy if your CSI driver supports it. EBS, GCP PD, and Azure Disk all support CSI snapshots — scheduled snapshots to S3/GCS/Blob are your last line of defence when everything else fails.

Practice Questions

1. Which PersistentVolume reclaim policy should you always use for production databases — preserving the data on the underlying storage when a PVC is deleted, requiring a human to manually reclaim it?

2. Which StorageClass volumeBindingMode delays EBS volume creation until a Pod is actually scheduled — preventing the volume from being created in the wrong availability zone?

3. A PVC is deleted. The bound PV has reclaimPolicy: Retain. What STATUS does the PV show, and can a new PVC immediately bind to it?

Quiz

Up Next · Lesson 29

Persistent Volume Claims

How developers actually request and use persistent storage — writing PVCs, mounting them into Pods, resizing them live, and the binding rules that determine which PV a PVC gets.

← Previous Course Index Next →

Kubernetes Course

Persistent Volumes

The Storage Abstraction Problem

Key PV Concepts Before the YAML

Creating a PersistentVolume Manually

Dynamic Provisioning with StorageClasses

The PV/PVC/StorageClass Architecture

PV with Node Affinity for Local Storage

Inspecting and Managing PVs

Practice Questions

Quiz