Node & Pod Affinity

nodeSelector matches simple label equality. Affinity rules go further: label expressions with operators (In, NotIn, Exists), hard requirements vs soft preferences, and Pod-to-Pod co-location or anti-co-location — keeping replicas spread across zones for high availability.

Required vs Preferred

Both node and Pod affinity support two modes:

requiredDuringSchedulingIgnoredDuringExecution

Hard requirement. Pod stays Pending if no matching node exists. Like nodeSelector but with richer expressions.

preferredDuringSchedulingIgnoredDuringExecution

Soft preference. Scheduler tries to satisfy it but falls back to any available node. Has a weight (1–100).

Node Affinity

The scenario: Your payment API must run in the us-east-1a or us-east-1b AZs (data residency requirement) and should prefer large instance types for better performance.

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In                # In: zone must be one of these values
                values:
                  - us-east-1a
                  - us-east-1b
                # NotIn: must NOT be one of the values
                # Exists: key must exist (any value)
                # DoesNotExist: key must not exist
                # Gt / Lt: numeric comparison (for resource quantities)

      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 80                        # Higher weight = stronger preference (1–100)
          preference:
            matchExpressions:
              - key: node.kubernetes.io/instance-type
                operator: In
                values:
                  - m5.2xlarge
                  - m5.4xlarge              # Prefer large instances but don't require them
        - weight: 20
          preference:
            matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                  - us-east-1a             # Slightly prefer 1a over 1b within the required set

Pod Anti-Affinity: Spreading Across Zones

The scenario: Your payment API has 3 replicas. If all 3 land in the same AZ and that AZ fails, the service goes down. Pod anti-affinity ensures replicas are spread across availability zones — a hard requirement for high availability.

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app: payment-api            # Match other Pods with this label
          topologyKey: topology.kubernetes.io/zone
          # topologyKey defines the "scope" of the anti-affinity
          # zone: no two payment-api Pods in the same AZ (hard HA requirement)
          # kubernetes.io/hostname: no two payment-api Pods on the same node

      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app: payment-api
            topologyKey: kubernetes.io/hostname
            # Also prefer different nodes within each zone (belt and suspenders)

$ kubectl get pods -o wide -n payments
NAME                  READY   NODE           ZONE
payment-api-abc-1     1/1     node-1a-large  us-east-1a   ← one per zone ✓
payment-api-abc-2     1/1     node-1b-large  us-east-1b   ✓
payment-api-abc-3     1/1     node-1a-small  us-east-1a   ← 3rd Pod, only 2 zones available
# Hard anti-affinity per zone is satisfied: 1a and 1b used
# 3rd Pod needed a 3rd zone but only 2 exist — it falls back to 1a (different node)
# For strict 1-per-zone, set replicas ≤ number of zones

What just happened?

topologyKey is the spreading domain — Setting topology.kubernetes.io/zone means "no two of these Pods in the same zone." Setting kubernetes.io/hostname means "no two on the same node." Zone spreading is the most important for availability — a node failure takes one Pod, a zone failure takes all Pods on that zone.

Pod Affinity (co-location) — The inverse of anti-affinity: schedule this Pod close to Pods with a given label. Use case: a cache sidecar that must be on the same node as the service that uses it (topologyKey: kubernetes.io/hostname), or an ML feature server that benefits from low latency to the model server in the same AZ.

topologySpreadConstraints: The Modern Spread Primitive

Pod anti-affinity for spreading has a limitation: with required, any replica beyond the number of topology domains goes Pending. topologySpreadConstraints is the modern, more flexible alternative — it spreads Pods as evenly as possible across domains while allowing overflow.

spec:
  topologySpreadConstraints:
    - maxSkew: 1                          # Max allowed difference in Pod count between zones
                                          # maxSkew: 1 → if zone-a has 3, zone-b can have at most 4
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule   # Hard: Pod stays Pending if skew would exceed maxSkew
                                          # ScheduleAnyway: Soft — schedule even if skew exceeded
      labelSelector:
        matchLabels:
          app: payment-api               # Only count Pods with this label toward skew

    - maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: ScheduleAnyway  # Soft: spread across nodes best-effort
      labelSelector:
        matchLabels:
          app: payment-api

Teacher's Note: Which spread mechanism to use

nodeSelector — Use for simple, mandatory node label constraints. GPU pools, OS type, bare metal vs virtualised.

Node affinity — Use when you need expressions (In, NotIn, Gt, Lt) or soft preferences with weights. Data residency (must be in these AZs), prefer large instance types.

Pod anti-affinity — Use when the domain you want to spread across is determined by other Pods (not node labels). "No two replicas on the same node."

topologySpreadConstraints — Use for even distribution across zones or nodes without the hard-limit problem of required anti-affinity. The best default for multi-replica Deployments in multi-AZ clusters.

Practice Questions

1. Which affinity mode causes a Pod to stay Pending indefinitely if no node satisfies the constraint — a hard requirement?

2. In a podAntiAffinity rule, which field defines whether Pods are spread across availability zones vs individual nodes?

3. Which Pod spec field provides flexible, even spreading across topology domains without causing excess replicas to go Pending when the number of replicas exceeds the number of domains?

Quiz

Up Next · Lesson 49

Horizontal Pod Autoscaler

HPA automatically scales the number of Pod replicas based on CPU, memory, or custom metrics. This lesson covers the metrics pipeline, scaling behaviour, stabilisation windows, and multi-metric scaling strategies.

← Previous Course Index Next →

Kubernetes Course