Kubernetes Lesson 17 – Labels and Selectors | Dataplexa
Core Kubernetes Concepts · Lesson 17

Labels and Selectors

Labels are the connective tissue of Kubernetes — they're how every object finds every other object, how traffic gets routed, and how you query your cluster at scale. Get them right and your cluster is navigable; get them wrong and you'll spend hours wondering why your Service isn't routing to the Pods you think it is.

What Labels Actually Are

A label is a key-value pair attached to any Kubernetes object — Pods, Deployments, Services, Nodes, you name it. They look deceptively simple. app: payment-api. env: production. tier: backend. But underneath that simplicity is the entire routing and ownership model of the cluster.

Think of labels like tags on luggage at an airport. The bag (object) carries tags (labels). The conveyor belt system (Kubernetes) uses those tags to route the bag to the right destination. The routing rules (selectors) say "give me everything tagged with destination: London." No tags, no routing. Wrong tags, wrong destination.

Labels serve two completely different purposes, and it helps to keep them distinct in your head:

🔗 Functional Labels

Used by Kubernetes itself to wire objects together. Services use them to find Pods. Deployments use them to own Pods. Network Policies use them to apply rules. These labels have real consequences — if they're wrong, traffic breaks.

🏷️ Organisational Labels

Used by humans to query and filter objects. Team names, cost centres, versions, environment names. These don't affect routing — they make kubectl get pods -l team=payments work so you can find your stuff fast.

Label Syntax Rules

Labels have a few constraints worth knowing. Keys can optionally have a prefix (like app.kubernetes.io/name) separated by a forward slash. The prefix must be a valid DNS subdomain. Without a prefix, the key is considered private to the user. Values must be 63 characters or fewer, start and end with an alphanumeric character, and can contain dashes, underscores, and dots in the middle.

⚠️ The most common label mistake

Mismatching the label on a Pod's template with the selector on the Deployment or Service. The Pod gets created but the Service has zero endpoints — traffic silently drops. Always double-check that spec.selector.matchLabels on a Deployment matches spec.template.metadata.labels exactly, and that the Service's selector matches the Pod's labels exactly.

Applying Labels in a Real Manifest

The scenario: You're a platform engineer at a company running a dozen microservices across three environments — development, staging, and production. The ops team is struggling to figure out which Pods belong to which team, which version is deployed where, and which environment an object is in. You're going to introduce a consistent labelling standard across the platform — starting with the checkout service.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-api
  namespace: production
  labels:                                   # Labels on the Deployment object itself
    app.kubernetes.io/name: checkout-api    # Recommended standard label: app name
    app.kubernetes.io/version: "2.3.0"      # Recommended standard label: version
    app.kubernetes.io/component: api        # What role this component plays
    app.kubernetes.io/part-of: checkout     # The larger system this belongs to
    app.kubernetes.io/managed-by: helm      # What tool manages this object
spec:
  replicas: 3
  selector:
    matchLabels:
      app: checkout-api                     # Selector: own Pods with label app=checkout-api
      env: production                       # AND label env=production
                                            # Both labels must match — this is an AND condition
  template:
    metadata:
      labels:
        app: checkout-api                   # MUST match selector above — Kubernetes enforces this
        env: production                     # MUST match selector above
        version: "2.3.0"                    # Extra label for canary/traffic splitting later
        team: payments                      # Organisational label — which team owns this Pod
    spec:
      containers:
        - name: checkout-api
          image: company/checkout-api:2.3.0
          ports:
            - containerPort: 3000
$ kubectl apply -f checkout-deployment.yaml
deployment.apps/checkout-api created

$ kubectl get pods --show-labels
NAME                            READY   STATUS    RESTARTS   AGE   LABELS
checkout-api-6f8b9d-2xpkj       1/1     Running   0          12s   app=checkout-api,env=production,team=payments,version=2.3.0
checkout-api-6f8b9d-7rvqn       1/1     Running   0          12s   app=checkout-api,env=production,team=payments,version=2.3.0
checkout-api-6f8b9d-m3czl       1/1     Running   0          12s   app=checkout-api,env=production,team=payments,version=2.3.0

What just happened?

app.kubernetes.io/ prefix — Kubernetes has a set of recommended labels that use the app.kubernetes.io/ prefix. These are standardised across the ecosystem — Helm uses them, dashboards like Lens and k9s understand them, and monitoring tools like Prometheus pick them up automatically. You don't have to use them, but adopting them on day one saves pain later.

Two labels in the selector — The selector uses both app: checkout-api AND env: production. Both must match. This means if you accidentally deploy a staging Pod with label env: staging into the same namespace, this Deployment won't accidentally adopt it.

--show-labels — Adding --show-labels to any kubectl get command appends a LABELS column. Useful for quickly verifying that your labels were applied correctly and match what your selectors expect.

Selectors: Two Types That Work Very Differently

A selector is a query against labels. Kubernetes has two types, and knowing when to use each is important.

Type Syntax Used in What it does
Equality-based app=checkout-api
env!=staging
Services, ReplicationControllers, kubectl -l Match exact label value (= or !=)
Set-based env in (prod, staging)
tier notin (frontend)
!deprecated
Deployments, ReplicaSets, Jobs, matchExpressions Match against a set of values — much more expressive

Querying with kubectl Using Label Selectors

The scenario: Your on-call shift just started and an alert fires. The alert says something is wrong in the payment domain in production. You need to quickly filter down to just the relevant Pods across several deployments — fast, without scrolling through hundreds of Pods from other teams.

kubectl get pods -l app=checkout-api
# -l: label selector flag — filters results to objects matching the label expression
# Returns all pods with label app=checkout-api regardless of namespace

kubectl get pods -l app=checkout-api,env=production
# Multiple labels: comma = AND — both labels must match
# This is equality-based selection: both must be exactly equal

kubectl get pods -l 'env in (production,staging)'
# Set-based: match pods in EITHER production OR staging environment
# Must quote the expression in single quotes in bash to avoid shell interpretation

kubectl get pods -l 'env in (production),team=payments'
# Mixing: set-based AND equality-based in the same selector — fully supported
# Returns pods that are in production AND belong to the payments team

kubectl get pods -l 'version notin (1.0,1.1)'
# notin: match pods whose version label is NOT in this set
# Useful for finding all non-legacy pods, or all pods running newer versions

kubectl get pods -l '!deprecated'
# !key: match objects that do NOT have this label at all
# Finds pods that haven't been tagged as deprecated — note the single quotes

kubectl get all -l app=checkout-api
# get all: returns Pods, Deployments, ReplicaSets, Services — everything matching the label
# The quickest way to see every object associated with a component

kubectl label pod checkout-api-6f8b9d-2xpkj hotfix=true
# label: imperatively add a label to an existing object without editing YAML
# Useful for temporary tagging during debugging — e.g. mark a Pod for special treatment
$ kubectl get pods -l app=checkout-api,env=production
NAME                            READY   STATUS    RESTARTS   AGE
checkout-api-6f8b9d-2xpkj       1/1     Running   0          8m
checkout-api-6f8b9d-7rvqn       1/1     Running   0          8m
checkout-api-6f8b9d-m3czl       1/1     Running   0          8m

$ kubectl get pods -l 'env in (production,staging)'
NAME                            READY   STATUS    RESTARTS   AGE
checkout-api-6f8b9d-2xpkj       1/1     Running   0          8m
checkout-api-6f8b9d-7rvqn       1/1     Running   0          8m
payment-api-7d6b9c-9kvpm        1/1     Running   0          2d
auth-service-4b2c8d-xr7nq       1/1     Running   0          1d

$ kubectl get all -l app=checkout-api
NAME                                READY   STATUS    RESTARTS   AGE
pod/checkout-api-6f8b9d-2xpkj       1/1     Running   0          9m
pod/checkout-api-6f8b9d-7rvqn       1/1     Running   0          9m
pod/checkout-api-6f8b9d-m3czl       1/1     Running   0          9m

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/checkout-api   3/3     3            3           9m

NAME                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/checkout-api-6f8b9d       3         3         3       9m

What just happened?

-l with comma = AND — Every comma-separated label expression is an AND condition. -l app=checkout-api,env=production means "give me objects that have BOTH of these labels." There is no OR operator in equality-based selectors — use set-based in () for OR across values of a single key.

kubectl get all — Using get all with a label selector is one of the fastest ways to get the full picture of a service during an incident. In one command you see the Pods, Deployment health, and ReplicaSet state. Be aware that "all" doesn't literally mean every resource type — it covers the most common ones but misses ConfigMaps, Secrets, PVCs, etc.

kubectl label (imperative) — Adding labels imperatively is handy during debugging. A common pattern: label a suspect Pod with debug=true, then use that label to attach a debug sidecar or isolate it from load balancer traffic. Just remember to clean it up — and never rely on imperatively-added labels as part of your routing configuration.

matchExpressions: The Full Power of Set-Based Selection

matchLabels (which you've seen in Deployments) is shorthand for the simplest case. matchExpressions is the full syntax and gives you much more control.

The scenario: Your platform team is writing a Job that runs database migrations. It should only run on nodes in the EU region that are flagged as general-purpose workload nodes — not on the spot/preemptible nodes reserved for batch processing, and not on nodes reserved for the GPU pool. You need a node selector that expresses all of that.

apiVersion: batch/v1            # Jobs live in the batch API group
kind: Job
metadata:
  name: db-migration-v3
spec:
  template:
    metadata:
      labels:
        app: db-migration
        run: v3
    spec:
      restartPolicy: Never      # Jobs use Never or OnFailure — not Always (which is for Deployments)
      nodeSelector:             # nodeSelector: simplest way to constrain which nodes a Pod runs on
        region: eu-west         # Only schedule on nodes labelled region=eu-west
        node-type: general      # AND node-type=general (not spot, not gpu)

      affinity:                 # affinity: more expressive node/pod placement rules (Lesson 48)
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:   # Hard requirement — must match
            nodeSelectorTerms:
              - matchExpressions:                           # matchExpressions: set-based node selection
                  - key: node-type                         # The label key to evaluate
                    operator: NotIn                        # Operator: In, NotIn, Exists, DoesNotExist
                    values:                                # The set of values to check against
                      - spot                               # Do NOT schedule on spot nodes
                      - preemptible                        # Do NOT schedule on preemptible nodes
                      - gpu                                # Do NOT schedule on gpu nodes
$ kubectl apply -f db-migration.yaml
job.batch/db-migration-v3 created

$ kubectl get pods -l app=db-migration
NAME                    READY   STATUS      RESTARTS   AGE
db-migration-v3-4xkpz   0/1     Completed   0          42s

$ kubectl get nodes --show-labels | grep region=eu-west
node-eu-west-1   Ready   <none>   14d   region=eu-west,node-type=general
node-eu-west-2   Ready   <none>   14d   region=eu-west,node-type=general
node-eu-west-3   Ready   <none>   14d   region=eu-west,node-type=spot

What just happened?

nodeSelector — The simplest form of node constraint. The Pod will only be scheduled onto nodes that have ALL the specified labels. Every node in your cluster that you want to be eligible must carry these labels — you set node labels with kubectl label node node-eu-west-1 region=eu-west.

matchExpressions operators — The four supported operators are: In (label value is in the set), NotIn (label value is NOT in the set), Exists (label key exists, any value), DoesNotExist (label key is not present at all). These give you expressive placement logic without hardcoding node names.

STATUS: Completed — Jobs run to completion rather than staying alive. A pod in Completed status exited with code 0 — the migration ran successfully. If it showed Failed, you'd check logs with kubectl logs db-migration-v3-4xkpz and the exit code in kubectl describe pod.

How Labels Wire the Whole System Together

Here's a concrete picture of how labels connect all the moving parts of a single application component. This is the ownership and routing graph for the checkout service:

Labels as the connective tissue — checkout-api example

Deployment
spec.selector:
app: checkout-api
env: production
owns Pods matching
these labels
Pod × 3
metadata.labels:
app: checkout-api
env: production
team: payments
version: "2.3.0"
carries all labels —
both functional & org
Service
spec.selector:
app: checkout-api
env: production
routes traffic to Pods
matching these labels
Key insight: The Deployment and the Service both independently select the same Pods using the same labels. They have no direct knowledge of each other — labels are the only connection. If you change a Pod's labels so they no longer match the Service selector, traffic stops reaching that Pod instantly, even if the Pod is healthy. This is how you can deliberately quarantine a Pod for debugging without killing it.

Recommended Label Schema for Production

The scenario: You're joining a company as their first dedicated platform engineer. The cluster has 40 microservices and nobody knows who owns what. You're going to define a labelling standard. Here's the manifest snippet that implements it — using the official Kubernetes recommended labels plus team-specific ones.

metadata:
  name: inventory-service
  labels:
    # --- Kubernetes Recommended Labels (understood by dashboards and tooling) ---
    app.kubernetes.io/name: inventory-service     # The name of the application
    app.kubernetes.io/instance: inventory-prod    # Unique instance name (useful for multiple envs)
    app.kubernetes.io/version: "4.1.2"            # Current version/tag of the application
    app.kubernetes.io/component: api              # Component role: api, worker, database, cache
    app.kubernetes.io/part-of: inventory-system   # The system this is a part of
    app.kubernetes.io/managed-by: helm            # Tool managing the lifecycle: helm, kubectl, argo

    # --- Organisational Labels (human-facing, not used by K8s for routing) ---
    team: inventory                               # Owning team — critical for multi-team clusters
    env: production                               # Environment: production, staging, development
    cost-centre: "CC-4412"                        # For chargeback/showback in FinOps pipelines
    on-call: inventory-oncall                     # PagerDuty rotation or Slack handle for alerts
$ kubectl get pods -l team=inventory,env=production
NAME                              READY   STATUS    RESTARTS   AGE
inventory-service-9b4d7f-2xkpz    1/1     Running   0          3d
inventory-service-9b4d7f-8rvnq    1/1     Running   0          3d
inventory-worker-3c8a2b-m4czl     1/1     Running   0          3d

$ kubectl get pods -l 'app.kubernetes.io/part-of=inventory-system'
NAME                              READY   STATUS    RESTARTS   AGE
inventory-service-9b4d7f-2xkpz    1/1     Running   0          3d
inventory-service-9b4d7f-8rvnq    1/1     Running   0          3d
inventory-worker-3c8a2b-m4czl     1/1     Running   0          3d
inventory-db-5f9c3d-p9wxt         1/1     Running   0          3d

What just happened?

app.kubernetes.io/part-of — This label enables you to query all Pods belonging to an entire system — the API, the worker, the database — with a single selector. When an incident hits the inventory system, one command gives you every affected Pod regardless of which specific component is the culprit.

cost-centre and on-call labels — These don't do anything in Kubernetes itself, but they're invaluable in larger organisations. Cost-centre labels feed into FinOps tools for chargeback. On-call labels are parsed by alert routing tools to page the right team when a Pod's metrics breach thresholds. Labels are cheap to add and expensive to retrofit later — add them from day one.

Teacher's Note: Label drift is a silent killer

Labels sound boring. They are not. I've seen production outages caused by a one-character typo in a label — app: payment-api on the Pod vs app: payments-api on the Service selector. Traffic dropped to zero. The Pods were running. The Deployment showed healthy. But the Service had no endpoints. It took 20 minutes to find because nobody checked labels first.

When traffic isn't reaching a Pod and everything looks healthy, the first thing you check is: does the Service selector exactly match the Pod labels? Run kubectl describe service [name] and look at the Endpoints line. If it says <none>, your labels don't match. Full stop.

Practice Questions

1. Write the kubectl command to list all Pods that have both the label app=checkout-api AND the label env=production.



2. When a Service is not routing traffic to any Pods, what field in kubectl describe service output shows <none> to confirm the selector is not matching any Pods?



3. In a Deployment or Job spec, what is the field name for set-based label selection that supports operators like In, NotIn, Exists, and DoesNotExist?



Quiz

1. Your Service selector is app: payment-api but your Pods have the label app: payments-api. What happens?


2. Which label selector expression correctly matches Pods where the env label is either production or staging?


3. In a Deployment manifest, what is the difference between labels under metadata.labels and labels under spec.template.metadata.labels?


Up Next · Lesson 18

Namespaces

How to divide a single cluster into isolated environments for different teams, and why the default namespace is a trap in production.