Taints and Tolerations

Taints let node operators repel Pods from a node — marking it as restricted to specific workloads. Tolerations are the matching declaration on a Pod that says "I accept this taint." Together they implement dedicated node pools, GPU isolation, and spot instance handling without needing complex scheduling rules.

How Taints and Tolerations Work

A taint has three parts: a key, a value, and an effect. The effect determines what happens to Pods that don't tolerate the taint:

Effect	What happens to intolerant Pods	Use for
NoSchedule	New Pods without a matching toleration will not be scheduled here. Existing Pods are unaffected.	Dedicated node pools: GPU nodes, high-memory nodes
PreferNoSchedule	Scheduler tries to avoid this node but will use it if no other option exists.	Soft preference — "use these nodes last"
NoExecute	New Pods won't be scheduled AND existing Pods without a matching toleration are evicted.	Node cordoning, spot instance reclaim, node maintenance

Creating Taints and Tolerations

The scenario: You have a pool of GPU nodes reserved exclusively for ML workloads. No other Pods should land on these expensive nodes. You taint the GPU nodes and add tolerations only to ML Pods.

# Add a taint to a node: key=value:Effect
kubectl taint node gpu-node-1 dedicated=gpu:NoSchedule
kubectl taint node gpu-node-2 dedicated=gpu:NoSchedule
# All new Pods without a matching toleration will be rejected from these nodes

# Remove a taint (append '-' to remove)
kubectl taint node gpu-node-1 dedicated=gpu:NoSchedule-

# View node taints
kubectl describe node gpu-node-1 | grep Taints
# Taints: dedicated=gpu:NoSchedule

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-training
spec:
  template:
    spec:
      tolerations:
        - key: "dedicated"
          operator: "Equal"         # Equal: value must match exactly
          value: "gpu"
          effect: "NoSchedule"      # Must match the taint's effect
          # This Pod can be scheduled on nodes with taint dedicated=gpu:NoSchedule

      nodeSelector:
        accelerator: nvidia-tesla-t4   # Also constrain to GPU-labelled nodes
        # Toleration alone doesn't ATTRACT a Pod to a node — it just allows it
        # Combine with nodeSelector or affinity to both allow AND attract
      containers:
        - name: trainer
          image: registry.company.com/ml-trainer:1.0.0
          resources:
            limits:
              nvidia.com/gpu: 2

$ kubectl get pods -o wide
NAME                  READY   NODE
ml-training-xyz-1     1/1     gpu-node-1   ← tolerated the taint ✓
ml-training-xyz-2     1/1     gpu-node-2   ✓

# Other Pods without the toleration stay off GPU nodes:
$ kubectl get pods -o wide -n default
NAME              READY   NODE
web-app-abc-1     1/1     worker-node-3   ← never lands on gpu-node-* ✓
web-app-abc-2     1/1     worker-node-4   ✓

What just happened?

Tolerations allow, they don't attract — A toleration says "I can run on nodes with this taint." It does not say "schedule me there preferentially." Without a nodeSelector or affinity rule, a Pod with a GPU toleration might still land on a CPU node if the scheduler scores it higher. Always pair a toleration with a nodeSelector or node affinity to both allow and attract.

The operator field — Equal (default): key, value, and effect must all match the taint. Exists: only the key needs to match — useful for tolerating all taints with a given key regardless of value (e.g., tolerate any node.kubernetes.io/not-ready taint).

Built-in Taints: Automatic Node Conditions

Kubernetes automatically taints nodes when they enter certain conditions. Understanding these is essential — they explain why Pods sometimes get evicted unexpectedly.

Taint	When applied	Effect
node.kubernetes.io/not-ready	Node fails readiness check	NoExecute
node.kubernetes.io/unreachable	Node controller loses contact	NoExecute
node.kubernetes.io/memory-pressure	Node is low on memory	NoSchedule
node.kubernetes.io/disk-pressure	Node is low on disk space	NoSchedule
node.kubernetes.io/unschedulable	kubectl cordon was run	NoSchedule

# Tolerate temporary node issues — keep Pods running longer before eviction
tolerations:
  - key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 300    # Stay on the node for 5 minutes before being evicted
                              # Default without this: 300s (set by the node lifecycle controller)
                              # Lower this for latency-sensitive services that need fast failover

  - key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 300    # Same — give the node time to recover before evicting

Spot Instance Pattern

Spot/preemptible instances are cheap but can be reclaimed by the cloud provider at any time. A common pattern: taint spot nodes so only workloads that can tolerate interruption run there, while critical services stay on on-demand nodes.

# Node group configuration (in eksctl or cloud-init):
# Spot nodes are automatically tainted by the cloud provider or node group config:
# kubectl taint node spot-node-1 spot=true:NoSchedule

# Batch job — tolerates spot interruption
tolerations:
  - key: "spot"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
# If spot node is reclaimed, the Job re-queues on another available node

# Critical API — no spot toleration, stays on on-demand
# (no tolerations entry needed — it simply won't be scheduled on tainted spot nodes)

Teacher's Note: Taints vs Network Policies — different isolation

Taints control where a Pod runs (node placement). Network Policies control what a Pod can communicate with (network access). They are complementary, not alternatives. A GPU node taint keeps CPU workloads off GPU nodes. A Network Policy keeps those GPU workloads from making outbound calls to the internet. Both are needed in a production multi-tenant cluster.

Practice Questions

1. Which taint effect evicts existing Pods from a node (in addition to blocking new ones) if they don't have a matching toleration?

2. A toleration needs to match any taint with key node.kubernetes.io/not-ready regardless of its value. Which operator should be used?

3. In a NoExecute toleration, which field controls how many seconds a Pod stays on the node before being evicted when the taint is applied?

Quiz

Up Next · Lesson 48

Node & Pod Affinity

Affinity rules give you expressive, flexible control over Pod placement — scheduling onto nodes with specific properties, keeping related Pods together, or spreading Pods across availability zones. Both hard requirements and soft preferences supported.

← Previous Course Index Next →

Kubernetes Course