Kubernetes Course
Master Node Components
In Lesson 4 we saw the full architecture from above. Now we zoom right in on the control plane — the brain of your cluster. We're going deep on how each component actually works, what happens when something goes wrong, and what you need to know to keep it healthy in production.
Quick Recap From Lesson 4
The control plane is the decision-making layer. It runs on dedicated machines separate from your application workloads. It has four components: the API Server, etcd, the Scheduler, and the Controller Manager. In managed Kubernetes (EKS, GKE, AKS) the cloud provider runs these for you. In self-managed clusters, you own them entirely.
Why the Control Plane Deserves Your Full Attention
Here is the hard truth about the control plane: if it goes down completely, your running applications keep working — but you lose the ability to do anything about them. No deployments. No scaling. No recovering from a crashed node. Your cluster becomes read-only from an operations standpoint.
That's why in production you always run the control plane in a highly available configuration — usually three control plane nodes, sometimes five. Odd numbers matter here because of how etcd elects a leader, which we'll cover shortly.
API Server — What Actually Happens Inside
We know the API Server is the front door. But every single request — from your kubectl command to the Scheduler writing an assignment — goes through four distinct stages before anything touches etcd.
One thing worth burning into your memory: the API Server is stateless. It holds no data of its own — all state lives in etcd. This is why you can run multiple API Servers in a highly available setup. They're all just proxies to etcd. If one dies, another picks up immediately with zero loss.
etcd — The Heart of Your Cluster
etcd is a distributed key-value store that uses an algorithm called Raft to stay consistent across multiple nodes. You don't need to understand Raft deeply right now — here's the essential version: in a cluster of three etcd nodes, one is elected leader. All writes go to the leader. The leader replicates those writes to at least one other node before confirming success.
This is why you need an odd number of etcd members — it needs a majority (called a quorum) to function. With 3 nodes, the quorum is 2 — you can lose one and keep going. With 5 nodes, quorum is 3 — you can lose two.
| etcd cluster size | Quorum needed | Nodes that can fail | Recommended for |
|---|---|---|---|
| 1 node | 1 | 0 | Dev / learning only |
| 3 nodes | 2 | 1 | ✓ Standard production |
| 5 nodes | 3 | 2 | Large critical clusters |
Losing etcd without a backup means losing your entire cluster's state. Every Deployment, every ConfigMap, every Secret, every Service — all gone. The worker nodes will keep their running containers alive for a short while, but you will have no way to manage them and no record of what should be running.
Back up etcd regularly. In Lesson 57 we cover exactly how to do this with a single command using etcdctl snapshot save. If you remember nothing else from this lesson, remember this.
Scheduler — How It Actually Picks a Node
The Scheduler doesn't pick nodes randomly. Every unscheduled Pod goes through a deliberate two-phase decision process — first eliminate nodes that can't work, then rank the ones that can.
Eliminate every node that simply cannot run this Pod:
From surviving nodes, score each one:
You deploy a Pod requesting 500m CPU and 256Mi memory. Your cluster has 3 nodes:
Node A is filtered out — not enough CPU. Node C wins because it has the most free resources, giving the Pod room to grow without crowding other workloads.
After scoring, the Scheduler writes the node assignment into the Pod's spec in etcd — and its job is done. It doesn't watch whether the Pod actually started. That's the kubelet's job.
Controller Manager — The Cluster's Immune System
The Controller Manager runs many individual controllers at once, each in its own background loop. Every controller does the same three things over and over:
The controllers you'll deal with most often as a Kubernetes engineer:
What Breaks When Each Component Fails
This is the practical knowledge that separates engineers who genuinely understand Kubernetes from those who just know the names. If you know exactly what breaks when something fails, you know exactly what to fix first.
| Component fails | What stops working | What keeps working |
|---|---|---|
| API Server | kubectl stops working. No new deployments, scaling, or config changes possible. | All running containers keep running. kubelet keeps health-checking them. |
| etcd | API Server can't read or write. Effectively brings the whole control plane down. | Already-running containers keep running temporarily. |
| Scheduler | New Pods stay in Pending — nobody assigns them to nodes. | All currently running Pods are completely unaffected. |
| Controller Manager | Self-healing stops. Crashed Pods won't be replaced. Dead nodes' workloads stay where they are. | Healthy running Pods continue normally. |
Checking the Control Plane With kubectl
The scenario: You're a DevOps engineer and something feels off — new Pods are getting stuck in Pending. Before you panic, you check the health of every control plane component. These are the exact commands you'd run first.
# Check all nodes — control plane nodes show role: control-plane
kubectl get nodes
# Check the health of control plane components
# Queries the /healthz endpoint of each one
kubectl get componentstatuses
# Control plane components run as Pods in the kube-system namespace
# This shows you the API Server, etcd, Scheduler, and Controller Manager
kubectl get pods -n kube-system
# Read the logs of a specific control plane component
# Replace the Pod name with what you see from the command above
kubectl logs kube-scheduler-control-plane -n kube-system
NAME STATUS ROLES AGE VERSION control-plane Ready control-plane 22d v1.28.0 worker-node-01 Ready <none> 22d v1.28.0 worker-node-02 Ready <none> 22d v1.28.0 NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-0 Healthy ok NAMESPACE NAME READY STATUS RESTARTS AGE kube-system etcd-control-plane 1/1 Running 0 22d kube-system kube-apiserver-control-plane 1/1 Running 0 22d kube-system kube-controller-manager-control-plane 1/1 Running 0 22d kube-system kube-scheduler-control-plane 1/1 Running 0 22d kube-system coredns-5d78c9869d-4xvzk 1/1 Running 0 22d kube-system coredns-5d78c9869d-7kzpt 1/1 Running 0 22d
kubectl get nodes — The ROLES column tells you what each machine is. A node with control-plane is a master node. Nodes with no role listed are worker nodes ready to run your application Pods.
kubectl get componentstatuses — This directly checks whether the Scheduler, Controller Manager and etcd are alive and responding. A Healthy: ok for all three means your control plane is fine — the Pending Pod problem lives somewhere else. An Unknown or Error here is your first lead.
kube-system namespace — On kubeadm clusters, all control plane components run as static Pods here. You can read their logs and describe them just like any other Pod. RESTARTS: 0 across all of them is healthy. High restart counts on etcd or the API Server means something is seriously wrong and needs investigation immediately.
On a kubeadm cluster, the control plane components themselves are just containers — they run as static Pods in the kube-system namespace. The kubelet on the control plane node starts them from YAML files sitting in /etc/kubernetes/manifests/. You can inspect their logs, describe them, and even restart one by temporarily moving its YAML file out of that folder.
In managed clusters like EKS, GKE or AKS you never see any of this — the cloud provider hides the control plane completely and just gives you the API Server endpoint. The trade-off is real: less operational overhead, but less visibility and control. Neither is universally better. It depends on your team's size and needs.
Practice Questions
Write from memory — don't scroll back.
1. What is the minimum number of etcd nodes recommended for a production Kubernetes cluster that needs to survive the loss of one node?
2. The Scheduler uses a two-phase process to pick a node for a Pod. What are the two phases called?
3. On a kubeadm-provisioned cluster, control plane components run as static Pods in which namespace?
Knowledge Check
Pick the best answer.
1. The Scheduler crashes on a single-control-plane cluster. You deploy a new app. What happens?
2. Every request passes through three security stages in the API Server. Which answer correctly names and describes all three?
3. The Controller Manager goes down. A Pod crashes 10 minutes later. What happens?
Up Next · Lesson 6
Worker Node Components
We flip to the other side of the cluster — the kubelet, kube-proxy, and container runtime in depth. These are the components that actually run your containers and keep them alive every second of the day.