Kubernetes Lesson 29 – Persistent Volumes Claims | Dataplexa
Core Kubernetes Concepts · Lesson 29

Persistent Volume Claims

A PersistentVolumeClaim is the developer's half of the storage contract. You declare what you need — how much space, which access mode, which storage class — and Kubernetes finds a PV that satisfies it. This lesson covers writing PVCs, mounting them into Pods, resizing live, and the binding rules that determine which PV your claim gets.

How PVC Binding Works

When you create a PVC, Kubernetes searches for an available PV that satisfies all four criteria: sufficient capacity, matching access mode, matching storage class, and matching label selector (if specified). The first PV found that meets all criteria gets bound to the PVC — and that binding is exclusive. No other PVC can claim the same PV.

Two important binding rules catch engineers off guard. First: a PVC requesting 20Gi can bind to a PV with 100Gi — it gets the full 100Gi, not just 20Gi. You can't ask for less than what's available; you get the whole PV. Second: if no available PV matches, the PVC stays in Pending state and any Pod trying to use it also stays Pending — with a clear event message explaining why.

⚠️ You get the whole PV — not just what you asked for

If you claim a 20Gi PVC and it binds to a 100Gi PV, the PVC occupies the entire 100Gi PV. The remaining 80Gi is wasted — it can't be given to another PVC. For static provisioning this is a sizing trap. For dynamic provisioning (StorageClass), the provisioner creates a PV sized exactly to the PVC request, so the waste doesn't happen.

Writing Your First PVC

The scenario: You're a backend engineer deploying PostgreSQL for the payments service on a cloud-managed Kubernetes cluster. You need 50Gi of reliable block storage that survives Pod restarts and rescheduling. The cluster uses the aws-ebs-gp3 StorageClass from the previous lesson.

apiVersion: v1
kind: PersistentVolumeClaim         # PVC: a request for storage — developer-facing object
metadata:
  name: postgres-payments-pvc       # PVC name — referenced by Pod spec volumeClaimTemplates or volumes
  namespace: payments               # PVCs ARE namespace-scoped — unlike PVs which are cluster-scoped
  labels:
    app: postgres
    team: payments
  annotations:
    description: "PostgreSQL data volume for payments service"
spec:
  storageClassName: aws-ebs-gp3     # Which StorageClass to use — must match an existing StorageClass
                                    # Omitting this uses the default StorageClass (if one is set)
                                    # Set to "" (empty string) to request a pre-existing PV with no class
  accessModes:
    - ReadWriteOnce                 # Must be a subset of the access modes the PV supports
                                    # Database: RWO is correct — one writer, on one node at a time
  resources:
    requests:
      storage: 50Gi                 # How much storage you need — provisioner creates EBS volume of this size
                                    # With dynamic provisioning: EBS volume is exactly 50Gi
                                    # With static provisioning: Kubernetes finds a PV >= 50Gi
  volumeMode: Filesystem            # Filesystem (default): volume is mounted as a directory
                                    # Block: raw block device — for databases managing their own I/O buffers
$ kubectl apply -f postgres-pvc.yaml
persistentvolumeclaim/postgres-payments-pvc created

$ kubectl get pvc -n payments
NAME                     STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE
postgres-payments-pvc    Pending                                                                        aws-ebs-gp3     3s

(WaitForFirstConsumer — PVC stays Pending until a Pod schedules and claims it)

$ kubectl describe pvc postgres-payments-pvc -n payments | grep -A5 "Events:"
Events:
  Normal  WaitForFirstConsumer  3s  persistentvolume-controller
    waiting for first consumer to be created before binding

What just happened?

STATUS: Pending with WaitForFirstConsumer — The PVC is pending because the StorageClass uses volumeBindingMode: WaitForFirstConsumer. This is correct behaviour — the EBS volume won't be created (and charged) until a Pod actually needs it, and it will be created in the same Availability Zone as the Pod. The moment you deploy a Pod that references this PVC, binding happens automatically.

VOLUME column is empty — While the PVC is Pending, no PV has been bound yet. Once binding happens (dynamically: after first Pod schedules; statically: immediately), this column shows the PV name. kubectl get pvc with a non-empty VOLUME column means you have a live, provisioned storage backing.

storageClassName must exactly match — The StorageClass name in the PVC must exactly match an existing StorageClass (case-sensitive). A typo here leaves the PVC in Pending forever — and the error message in events is sometimes cryptic. Always kubectl get storageclass first to confirm the exact name before writing the PVC.

Mounting a PVC into a Pod

A PVC by itself does nothing — it's just a reservation. To actually use the storage, you mount it into a container via the Pod's volumes and volumeMounts blocks. This is identical to any other volume type — the PVC simply acts as the source.

The scenario: The PVC is provisioned. Now you need to mount it into a PostgreSQL container so the database can write its data files to the persistent EBS volume at /var/lib/postgresql/data — the directory PostgreSQL uses by default.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres-payments
  namespace: payments
spec:
  replicas: 1                         # Databases: always 1 replica with RWO storage
                                      # Multiple replicas + RWO = only the Pod on the first node
                                      # that mounts the volume actually works — others stay Pending
  selector:
    matchLabels:
      app: postgres-payments
  template:
    metadata:
      labels:
        app: postgres-payments
    spec:
      containers:
        - name: postgres
          image: postgres:15.4         # Pin the exact version — never use postgres:latest
          ports:
            - containerPort: 5432
          env:
            - name: POSTGRES_DB
              value: "payments"
            - name: POSTGRES_USER
              valueFrom:
                secretKeyRef:
                  name: postgres-credentials
                  key: POSTGRES_USER
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-credentials
                  key: POSTGRES_PASSWORD
            - name: PGDATA
              value: /var/lib/postgresql/data/pgdata
              # PGDATA: PostgreSQL data directory — must be a subdirectory of the mount
              # EBS volumes mounted at the root path sometimes have a lost+found directory
              # that causes postgres to refuse to initialise — the pgdata subdir avoids this
          volumeMounts:
            - name: postgres-data      # Must match the volume name declared below
              mountPath: /var/lib/postgresql/data   # Standard PostgreSQL data directory
          resources:
            requests:
              cpu: "250m"
              memory: "512Mi"
            limits:
              cpu: "1000m"
              memory: "1Gi"
          readinessProbe:
            exec:
              command: ["pg_isready", "-U", "postgres"]
            initialDelaySeconds: 10
            periodSeconds: 5

      volumes:
        - name: postgres-data          # Volume name — referenced by volumeMounts above
          persistentVolumeClaim:       # persistentVolumeClaim: use a PVC as a volume source
            claimName: postgres-payments-pvc  # The PVC name in the same namespace
            readOnly: false            # readOnly: false — database needs write access
$ kubectl apply -f postgres-deployment.yaml
deployment.apps/postgres-payments created

$ kubectl get pvc -n payments
NAME                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
postgres-payments-pvc    Bound    pvc-a4f9b2d1-8c3e-4b7a-9f2d-1e5c6a8b0d4f   50Gi       RWO            aws-ebs-gp3    2m

$ kubectl get pods -n payments
NAME                                 READY   STATUS    RESTARTS   AGE
postgres-payments-6f8b9d-2xkpj       1/1     Running   0          45s

$ kubectl exec -it postgres-payments-6f8b9d-2xkpj -n payments -- df -h /var/lib/postgresql/data
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme1n1    49G   156M   49G   1% /var/lib/postgresql/data

What just happened?

PVC moved to Bound on Pod scheduling — The moment the Deployment created a Pod and the scheduler picked a node, the EBS CSI driver provisioned a 50Gi EBS volume in the same AZ, created a PV, and bound the PVC to it. The whole process took a few seconds. The VOLUME column now shows the auto-generated PV name starting with pvc-.

The PGDATA subdirectory trick — Setting PGDATA=/var/lib/postgresql/data/pgdata (a subdirectory of the mount) prevents PostgreSQL from complaining about a non-empty data directory. When EBS formats a new volume, it may create a lost+found directory at the root. PostgreSQL sees that directory and thinks the data directory is already populated — but not by PostgreSQL. It refuses to initialise. The subdirectory sidesteps this entirely.

replicas: 1 for RWO databases — PostgreSQL with a RWO PVC cannot be safely scaled to multiple replicas. If you set replicas: 2, the second Pod will either: (a) get stuck in Pending because the EBS volume is already attached to the first Pod's node, or (b) on rare occasions both attach to the same node and corrupt the database by writing to the same files simultaneously. For multi-replica databases, use StatefulSets (Lesson 30) with each replica getting its own PVC.

Resizing a PVC Live

When your database starts filling up, you need to expand the PVC without taking it offline. If the StorageClass has allowVolumeExpansion: true, you can edit the PVC's storage request and the CSI driver will expand the underlying volume.

The scenario: It's a Tuesday afternoon. Your monitoring shows the payments PostgreSQL instance is at 85% disk capacity. At the current write rate it'll hit 100% in 48 hours. You need to expand it from 50Gi to 100Gi before it becomes critical — and you need to do it without any downtime.

kubectl get pvc postgres-payments-pvc -n payments
# Confirm current size and Bound status before attempting resize

kubectl patch pvc postgres-payments-pvc -n payments \
  -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'
# patch: update the storage request from 50Gi to 100Gi
# The CSI driver detects the change and expands the underlying EBS volume
# EBS resize is online — no Pod restart required for the block device expansion
# However, the FILESYSTEM resize may require a Pod restart on some older CSI driver versions

kubectl get pvc postgres-payments-pvc -n payments -w
# Watch for the Conditions field to update:
# FileSystemResizePending → FileSystemResizeSuccessful
# This confirms both the block device and the filesystem have been expanded

kubectl exec -it postgres-payments-6f8b9d-2xkpj -n payments -- df -h /var/lib/postgresql/data
# Confirm the new size is visible inside the container
$ kubectl patch pvc postgres-payments-pvc -n payments \
  -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'
persistentvolumeclaim/postgres-payments-pvc patched

$ kubectl get pvc postgres-payments-pvc -n payments
NAME                     STATUS   VOLUME                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
postgres-payments-pvc    Bound    pvc-a4f9b2d1-...           50Gi       RWO            aws-ebs-gp3    3d

(CAPACITY still shows 50Gi — EBS resize triggered, filesystem resize in progress)

$ kubectl describe pvc postgres-payments-pvc -n payments | grep -A4 "Conditions:"
Conditions:
  Type                      Status  LastProbeTime   Message
  ----                      ------  -------------   -------
  FileSystemResizePending   True    <unknown>        Waiting for user to (re-)start a pod to finish
                                                     file system resize of volume

$ kubectl rollout restart deployment/postgres-payments -n payments
deployment.apps/postgres-payments restarted

$ kubectl exec -it postgres-payments-6f8b9d-9nkrp -n payments -- df -h /var/lib/postgresql/data
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme1n1    99G   42G   57G  43% /var/lib/postgresql/data

What just happened?

Two-phase resize — PVC expansion has two phases. Phase 1: the block device (EBS volume) is expanded. This happens online without any interruption. Phase 2: the filesystem on the block device is expanded to use the new space. Filesystem resize requires the volume to be unmounted and remounted — which means a Pod restart for most CSI drivers. The FileSystemResizePending: True condition tells you the block device grew but the filesystem hasn't caught up yet.

CAPACITY column lag — The CAPACITY column in kubectl get pvc updates to show the new size only after the filesystem resize completes. During the window between block device expansion and filesystem expansion, it still shows the old size — which is confusing but expected. Check kubectl describe pvc for the Conditions to see the real status.

You can only expand — never shrink — PVC size is a one-way door. You can go from 50Gi to 100Gi but never from 100Gi back to 50Gi. Kubernetes enforces this at the API level — any patch that reduces storage is rejected with a validation error. Plan your initial sizing conservatively and resize up when needed.

PVC Selectors: Targeting a Specific PV

For static provisioning scenarios — where the storage admin has pre-created specific PVs — you can use a label selector in the PVC to target exactly which PV you want, rather than letting Kubernetes pick the first available match.

The scenario: Your company has a strict data residency requirement — the EU payments database must only be backed by storage physically located in the EU. The storage team has labelled EU-resident NFS PVs with region: eu-west. You need to ensure the PVC only binds to an EU PV, not a US one.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-eu-pvc
  namespace: payments-eu
spec:
  storageClassName: manual          # manual storageClass = static provisioning only
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  selector:                         # selector: only bind to PVs matching these labels
    matchLabels:
      region: eu-west               # Only bind to PVs labelled region=eu-west
      environment: production       # AND environment=production
                                    # Both conditions must match — this is an AND not OR
    matchExpressions:               # matchExpressions: set-based selector for more flexibility
      - key: tier
        operator: In
        values: ["ssd", "nvme"]     # Only bind to PVs on ssd or nvme storage tiers
$ kubectl apply -f postgres-eu-pvc.yaml
persistentvolumeclaim/postgres-eu-pvc created

$ kubectl get pvc postgres-eu-pvc -n payments-eu
NAME               STATUS   VOLUME             CAPACITY   ACCESS MODES   STORAGECLASS   AGE
postgres-eu-pvc    Bound    postgres-pv-eu01   200Gi      RWO            manual         2s

$ kubectl describe pvc postgres-eu-pvc -n payments-eu | grep -A3 "Selector:"
Selector:  environment=production,region=eu-west,tier in (nvme,ssd)

What just happened?

PVC bound to a 200Gi PV despite requesting 100Gi — The label selector matched one available PV in the EU. That PV happened to be 200Gi. Kubernetes bound the PVC to it — the PVC gets 200Gi even though it only asked for 100Gi. This is the static provisioning sizing trap from earlier in the lesson. The extra 100Gi is reserved by this PVC and unavailable to anyone else.

selector disables dynamic provisioning — When a PVC has a selector, Kubernetes only searches pre-existing PVs — it never triggers dynamic provisioning. This is intentional: selectors are a static provisioning feature. If you use a selector on a PVC with a StorageClass that has a provisioner, the selector wins and dynamic provisioning is disabled.

The Full PVC Lifecycle

Here's every state a PVC passes through from creation to cleanup, including what happens to the underlying data at each transition:

PVC Lifecycle — from creation to data fate

Pending
PVC created, searching for a matching PV. With WaitForFirstConsumer: stays here until a Pod schedules. If no matching PV exists and no dynamic provisioner: stuck here until an admin creates a matching PV.
Bound
PVC is matched and bound to a PV. The EBS/NFS/disk is attached. Pods can now mount this PVC. The PV is exclusively reserved for this PVC — no other PVC can use it. This is the healthy operational state.
Terminating
kubectl delete pvc issued. If Pods are still using the PVC, deletion is blocked by a finalizer — the PVC shows Terminating until all Pods release the mount. Kubernetes protects you from pulling storage out from under a running Pod.
Deleted
PVC object gone. Now it depends on the reclaim policy: Delete → PV and EBS volume deleted, data lost. Retain → PV moves to Released, data preserved, admin must manually clean up.

Debugging a Stuck PVC

The scenario: A developer submits a PVC. It's been Pending for 10 minutes. Pods using it are also Pending. Production is waiting. Here's the systematic diagnosis workflow.

kubectl get pvc -n payments
# First look: STATUS column — Pending means no PV bound yet

kubectl describe pvc postgres-payments-pvc -n payments
# KEY: look at Events section — it tells you exactly why it's pending:
# "no persistent volumes available for this claim and no storage class is set" → missing storageClassName
# "waiting for a volume to be created" → dynamic provisioner working (or stuck)
# "storageclass.storage.k8s.io not found" → StorageClass name is wrong or doesn't exist

kubectl get storageclass
# Verify the StorageClass the PVC references actually exists
# Check if it has (default) marker — a PVC with no storageClassName needs a default SC

kubectl get pv
# For static provisioning: are there any Available PVs that could match?
# Check that access modes, capacity, and storageClassName all match the PVC request

kubectl get events -n payments --sort-by='.lastTimestamp' | grep -i pvc
# Events stream: provisioner errors, binding failures, quota exceeded
# The provisioner logs errors here when it fails to create the backing volume

kubectl logs -n kube-system -l app=ebs-csi-controller -c csi-provisioner
# CSI provisioner logs — when dynamic provisioning fails, the error is here
# Common errors: IAM permission denied, AZ capacity exhausted, quota exceeded
$ kubectl describe pvc postgres-payments-pvc -n payments
...
Events:
  Warning  ProvisioningFailed  2m  ebs.csi.aws.com_ebs-csi-controller-...
    failed to provision volume with StorageClass "aws-ebs-gp3":
    rpc error: code = Internal desc = Could not create volume "pvc-xxx":
    could not create EBS volume: operation error EC2:
    CreateVolume, https response error StatusCode: 400,
    InvalidParameterValue: The availability zone 'us-east-1e' is no longer
    supported. Please use a different AZ.

(root cause found: AZ us-east-1e is deprecated — Pod was scheduled there
 because a node selector forced it. Fix: remove the node selector or add
 a node that exists in a supported AZ)

What just happened?

The Events section is the first stop — For stuck PVCs, kubectl describe pvc Events gives you the exact error from the provisioner. The error here is from AWS: the Availability Zone us-east-1e is deprecated and no longer accepts new EBS volumes. The Pod was forced into that AZ by a node selector on an old node, and WaitForFirstConsumer tried to create the EBS volume in the same AZ — which failed.

The CSI controller logs for deep debugging — When the Events section doesn't have enough detail, the CSI provisioner sidecar container's logs have the full stack trace. The provisioner runs as a sidecar named csi-provisioner inside the CSI controller Pod in kube-system. Specifying -c csi-provisioner targets exactly that sidecar container.

Teacher's Note: PVCs are the developer interface — treat them like an API contract

The entire PV/PVC design is deliberately asymmetric. PVs are infrastructure — created by admins or provisioners, cluster-scoped, backed by real hardware or cloud resources. PVCs are application contracts — created by developers, namespace-scoped, expressing what the application needs without caring about the underlying infrastructure. This is exactly how it should be. A developer writing a PostgreSQL deployment shouldn't need to know whether the cluster runs on AWS, GCP, bare metal, or a Raspberry Pi cluster.

The most important operational habit around PVCs: never delete a PVC without confirming what happens to the data. Check the StorageClass reclaimPolicy. Check whether the data is backed up. In teams running with Delete policy (the default for most cloud StorageClasses), a kubectl delete namespace deletes every PVC in the namespace, which deletes every PV, which deletes every EBS volume. All at once. Silently. This has caused irreversible data loss in production at multiple companies I know of.

One protection mechanism worth knowing: set a ResourceQuota on the namespace limiting the number of PVCs and the total storage request. This prevents accidental over-provisioning and gives you an explicit limit to review before any large deletion.

Practice Questions

1. You apply a PVC with storageClassName: aws-ebs-gp3 which uses volumeBindingMode: WaitForFirstConsumer. No Pod references the PVC yet. What STATUS does the PVC show?



2. You try to patch a PVC to reduce its storage from 100Gi to 50Gi. What happens?



3. A PVC has been Pending for 10 minutes. What is the first kubectl command you run to find out why it hasn't bound to a PV?



Quiz

1. A PVC requests 20Gi and binds to the only available PV which has 100Gi. What happens to the capacity?


2. You run kubectl delete pvc postgres-payments-pvc while a Pod is actively using it. What happens immediately?


3. You're deploying PostgreSQL backed by an EBS PVC with access mode ReadWriteOnce. Which deployment approach is correct?


Up Next · Lesson 30

StatefulSets

Running databases, queues, and distributed systems in Kubernetes — stable network identities, ordered deployment, and per-replica persistent storage that makes stateful workloads actually work.