Kubernetes Lesson 31 – Kubernetes Networking Deep Dive | Dataplexa
Networking, Ingress & Security · Lesson 31

Kubernetes Networking Deep Dive

Every request that flows through your cluster — from a browser to a Pod, from one microservice to another, from a health check to your database — follows rules laid down by Kubernetes networking. Understanding those rules is the foundation for everything in this section: Services, Ingress, DNS, and Network Policies.

The Kubernetes Networking Model — Four Rules

Kubernetes imposes a specific networking model that every CNI plugin must implement. There are four rules, and they're deliberately simpler than what you'd find in traditional VM networking:

Rule 1

Every Pod gets its own unique IP address. No two Pods share an IP — even across different nodes.

Rule 2

Pods on the same node can communicate with each other directly using their Pod IPs — no NAT.

Rule 3

Pods on different nodes can communicate with each other using their Pod IPs — no NAT required. The underlying network must make this possible.

Rule 4

The IP a Pod sees for itself (its own IP) is the same IP other Pods use to reach it. There is no hidden translation layer between peers.

These rules create a flat network — conceptually, every Pod can reach every other Pod directly. No NAT, no port remapping, no IP masquerading between Pods. This flatness makes service discovery simple and debugging straightforward. The complexity of how this is actually achieved on physical hardware is pushed down to the CNI plugin.

The Three Network Layers in Every Cluster

Real Kubernetes clusters have three distinct network layers, each with its own IP range, purpose, and rules. Keeping them separate in your mental model prevents enormous confusion when debugging.

Network Typical CIDR What lives here Who manages it
Node Network 192.168.0.0/16 Physical or virtual machines (the nodes). The real IP addresses you SSH into. Cloud VPC / data centre networking
Pod Network 10.244.0.0/16 Pod IPs. Every Pod gets an IP from this range. Ephemeral — changes with every Pod restart. CNI plugin (Calico, Flannel, Cilium, WeaveNet)
Service Network 10.96.0.0/12 Service ClusterIPs. Virtual IPs — no actual network interface holds them. Stable, survive Pod churn. kube-proxy / eBPF rules on every node

⚠️ Pod IPs are ephemeral — never hardcode them

Every time a Pod is deleted and recreated — during a rolling update, a node drain, a crash — it gets a new IP from the Pod network range. Any service that hardcoded the old IP breaks immediately. This is why Services exist: they provide a stable virtual IP and DNS name that stays constant regardless of how often the backing Pods change IPs. Always communicate with Services, never with Pod IPs directly.

What the CNI Plugin Does

The Container Network Interface (CNI) plugin is the component that actually implements the Kubernetes networking model on physical hardware. When a new Pod is created, the CNI plugin is called to: assign the Pod an IP from the Pod CIDR, create a virtual network interface inside the Pod's network namespace, connect it to the node's network, and set up routing so other Pods can reach the new IP.

Different CNI plugins implement cross-node Pod communication in different ways:

CNI Plugin Approach Notable features Common in
Flannel VXLAN overlay — encapsulates Pod traffic in UDP packets between nodes Simple, minimal. No NetworkPolicy support. Dev clusters, kubeadm defaults
Calico BGP routing — advertises Pod CIDRs as routes between nodes (no overlay needed in pure BGP mode) Full NetworkPolicy support. High performance. Widely used in production. Production clusters, on-prem, EKS
Cilium eBPF — bypasses iptables entirely, implements networking in Linux kernel using eBPF programs Best performance, L7 network policies, Hubble observability. Increasingly the default. GKE, EKS (Cilium mode), modern clusters
AWS VPC CNI Native — assigns real VPC ENIs/IPs to Pods, no overlay needed Pods get routable VPC IPs. Security Groups on Pods. Native AWS integration. EKS (default)

How a Request Travels Across Nodes

The scenario: Pod A on Node 1 (IP 10.244.1.5) sends an HTTP request to Pod B on Node 2 (IP 10.244.2.8). Here's exactly what happens at the network level — showing the VXLAN overlay path used by Flannel, which is the easiest to reason about.

kubectl get pods -o wide -n production
# -o wide: adds NODE and IP columns — shows which physical node each Pod is on
# and what Pod IP it has — critical for understanding cross-node traffic paths

kubectl exec -it checkout-api-6f8b9d-2xkpj -n production -- ip addr show eth0
# ip addr: show the network interface inside the container
# eth0 is typically the Pod's main interface — assigned by the CNI plugin
# Shows: inet 10.244.2.14/32 — the Pod's IP from the Pod CIDR

kubectl exec -it checkout-api-6f8b9d-2xkpj -n production -- ip route show
# ip route: show the routing table inside the Pod
# Typically shows: default via 169.254.1.1 — the gateway used for all traffic leaving the Pod
# The CNI plugin handles ARP for this gateway address

kubectl exec -it checkout-api-6f8b9d-2xkpj -n production -- \
  curl -v http://10.244.1.5:8080/health
# Direct Pod-to-Pod call using Pod IP — works but not recommended in production
# Use Service ClusterIP or DNS name instead for stability
$ kubectl get pods -o wide -n production
NAME                             READY   STATUS    NODE              IP
checkout-api-6f8b9d-2xkpj        1/1     Running   node-eu-west-1a   10.244.2.14
checkout-api-6f8b9d-7rvqn        1/1     Running   node-eu-west-1b   10.244.3.7
payment-api-7d9c4b-xr7nq         1/1     Running   node-eu-west-1a   10.244.2.15
auth-service-5c7d8f-p2rkx        1/1     Running   node-eu-west-1c   10.244.4.2

$ kubectl exec -it checkout-api-6f8b9d-2xkpj -n production -- ip addr show eth0
3: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP>
    inet 10.244.2.14/32 brd 10.244.2.14 scope global eth0
    link/ether 6e:2a:4c:8d:1b:3f brd ff:ff:ff:ff:ff:ff

$ kubectl exec -it checkout-api-6f8b9d-2xkpj -n production -- ip route show
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link

What just happened?

Each Pod has its own /32 — The Pod IP 10.244.2.14/32 is a host route — no subnet, just one address. This is the CNI plugin's mechanism: it gives each Pod a point-to-point address and handles routing at the node level, not the Pod level. The Pod doesn't need to know about other Pods' IPs.

169.254.1.1 as the gateway — This link-local address is the CNI plugin's virtual gateway. It doesn't correspond to a real interface on the host — the CNI answers ARP requests for it and routes packets accordingly. When Pod A sends traffic to 10.244.3.7 (a Pod on another node), it sends the packet to 169.254.1.1, and the CNI on Node 1 encapsulates it in VXLAN and sends it to Node 2.

Pods on the same node — Two Pods on the same node (checkout-api and payment-api both on node-eu-west-1a) communicate through a virtual bridge on the node — much faster than cross-node traffic. No overlay encapsulation is needed.

How kube-proxy Implements Services

When you create a Service, something must make the Service's ClusterIP actually work — since no network interface actually holds that IP. That something is kube-proxy, which runs as a DaemonSet on every node.

kube-proxy watches the Kubernetes API for Service and Endpoints changes. When you create a Service with ClusterIP 10.96.214.88:80 backed by three Pod endpoints, kube-proxy programs the Linux kernel's iptables (or eBPF with Cilium) on every node to intercept packets destined for that IP:port and DNAT (Destination Network Address Translate) them to one of the backing Pod IPs.

kubectl get endpoints checkout-api-svc -n production
# endpoints: shows the actual Pod IPs and ports behind a Service
# This is the list kube-proxy programs into iptables
# If Endpoints shows , the Service selector doesn't match any Pod labels

kubectl get endpoints checkout-api-svc -n production -o yaml
# -o yaml: full endpoint object — shows the IP, port, and readiness state of each backing Pod
# NotReadyAddresses: Pods that exist but haven't passed readiness probe yet — not in rotation

kubectl describe endpoints checkout-api-svc -n production
# Detailed view — shows which Pods are in the active Addresses list vs NotReadyAddresses

iptables -t nat -L KUBE-SERVICES -n | grep checkout
# On a node: show the iptables NAT rules for the checkout service
# (requires node access — for debugging, not routine operations)
# Shows: DNAT rule directing ClusterIP:80 → one of the Pod IPs via load balancing chain
$ kubectl get endpoints checkout-api-svc -n production
NAME               ENDPOINTS                                         AGE
checkout-api-svc   10.244.2.14:3000,10.244.3.7:3000,10.244.4.2:3000  8d

$ kubectl get endpoints checkout-api-svc -n production -o yaml
apiVersion: v1
kind: Endpoints
subsets:
- addresses:
  - ip: 10.244.2.14
    nodeName: node-eu-west-1a
    targetRef:
      kind: Pod
      name: checkout-api-6f8b9d-2xkpj
      namespace: production
  - ip: 10.244.3.7
    nodeName: node-eu-west-1b
    targetRef:
      kind: Pod
      name: checkout-api-6f8b9d-7rvqn
      namespace: production
  notReadyAddresses: []       ← empty = all Pods are currently healthy and in rotation
  ports:
  - port: 3000
    protocol: TCP

What just happened?

Endpoints are the live truth — The Endpoints object is what kube-proxy actually uses to program iptables. When you're debugging "why is traffic not reaching my Pod," checking kubectl get endpoints is often the fastest answer. If the Endpoints list is empty (<none>), the Service selector doesn't match any Pod labels — traffic would be silently dropped.

notReadyAddresses — Pods that are Running but haven't passed their readiness probe appear here instead of in the addresses list. kube-proxy never routes traffic to notReadyAddresses. This is the mechanism that prevents traffic from reaching Pods that are still warming up during a rolling update.

kube-proxy load balancing — The default iptables mode distributes traffic using round-robin via probability-based iptables rules. With three endpoints, the first rule matches with probability 1/3, the second with 1/2 of the remaining (= 1/3), and the third gets all remaining traffic (= 1/3). It's not perfectly round-robin (it's random, not sequential) but statistically uniform over many connections.

The Full Cluster Networking Architecture

Here's how all three network layers and all the major components fit together in a two-node cluster handling a real request:

Cross-Node Request Flow: Pod A → Service → Pod B

NODE 1 (192.168.0.10)
Pod A (checkout-api)
IP: 10.244.1.5 · eth0
Sends packet to 10.96.214.88:80
(Service ClusterIP)
kube-proxy (iptables)
DNAT: 10.96.214.88:80 → 10.244.2.8:3000
Picks Pod B's IP from Endpoints list
CNI / VXLAN
Encapsulates packet: src=192.168.0.10
dst=192.168.0.11, UDP port 8472
NODE 2 (192.168.0.11)
CNI / VXLAN decap
Decapsulates UDP packet
Restores inner packet: dst=10.244.2.8
Node routing
Routes to veth pair for Pod B
via node bridge (cbr0 / cni0)
Pod B (payment-api)
IP: 10.244.2.8 · eth0
Receives request on port 3000
Sees src IP as Pod A's IP
Key insight: Pod A addressed the request to the Service ClusterIP. kube-proxy transparently rewrote the destination to a Pod IP. The VXLAN tunnel carried it to the right node. Pod B sees the source IP as Pod A's real IP — not the Service IP. All of this happens in <1ms on a healthy cluster.

Practical Network Debugging

The scenario: Service A is getting connection refused errors when trying to reach Service B. Both services are running, both have healthy Pods. You need to systematically diagnose whether the problem is in DNS, the Service selector, the endpoints list, or the actual application. Here's the runbook.

kubectl run debug-pod --image=nicolaka/netshoot --rm -it --restart=Never -n production
# netshoot: a debug container pre-loaded with curl, dig, nslookup, nmap, tcpdump, netstat
# --rm: delete the Pod when you exit
# --restart=Never: don't recreate it if it exits
# Use this to debug networking from INSIDE the cluster network namespace

# Once inside the debug pod:
curl -v http://checkout-api-svc:80/health
# Test Service connectivity by DNS name — if this works, Service + DNS are fine

nslookup checkout-api-svc.production.svc.cluster.local
# Resolve the fully-qualified Service DNS name
# If this fails: CoreDNS is down or misconfigured

curl -v http://10.96.214.88:80/health
# Test by ClusterIP directly — if DNS works but ClusterIP doesn't, kube-proxy issue

curl -v http://10.244.2.14:3000/health
# Test by Pod IP directly — bypasses Service and kube-proxy entirely
# If this works but Service doesn't: selector/endpoints issue

netstat -tlnp
# Inside a Pod: check which ports are actually listening
# If the app isn't listening on the right port, no amount of Service config helps
$ kubectl run debug-pod --image=nicolaka/netshoot --rm -it --restart=Never -n production
If you don't see a command prompt, try pressing enter.
bash-5.1# nslookup checkout-api-svc.production.svc.cluster.local
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      checkout-api-svc.production.svc.cluster.local
Address 1: 10.96.214.88 checkout-api-svc.production.svc.cluster.local

bash-5.1# curl -v http://10.96.214.88:80/health
* Trying 10.96.214.88:80...
* Connected to 10.96.214.88 (10.96.214.88) port 80 (#0)
> GET /health HTTP/1.1
< HTTP/1.1 200 OK
{"status":"healthy"}

bash-5.1# curl -v http://10.244.2.14:3000/health
* Trying 10.244.2.14:3000...
* connect to 10.244.2.14 port 3000 failed: Connection refused

(Pod IP direct fails — app listening on wrong port! kubectl describe pod to check
 containerPort vs actual app port in the container)

What just happened?

The debugging ladder — DNS name → ClusterIP → Pod IP directly is the standard debugging progression. Each step isolates a different layer. DNS fails = CoreDNS issue. DNS works but ClusterIP fails = kube-proxy or no endpoints. ClusterIP works but Pod IP fails = application-level issue (wrong port, firewall rule inside container, app not bound to 0.0.0.0).

nicolaka/netshoot — This is the Swiss Army knife debug container for Kubernetes networking. It contains curl, wget, dig, nslookup, ping, traceroute, nmap, tcpdump, iperf, netstat, ss, ip, and more. Keep it in your muscle memory: kubectl run debug-pod --image=nicolaka/netshoot --rm -it --restart=Never. It runs in the same network namespace as the cluster so you're debugging from inside.

App not listening on 0.0.0.0 — A common application bug: the app is listening on 127.0.0.1:3000 (localhost only) instead of 0.0.0.0:3000 (all interfaces). The Pod IP direct test catches this — the packet arrives on the Pod's eth0 interface but the app refuses it because it's only accepting connections from loopback. Check with netstat -tlnp inside the container.

Teacher's Note: Why the flat network model matters so much

Traditional VM networking has NAT everywhere — NAT at the cloud VPC border, NAT between subnets, port forwarding to expose services. Debugging is nightmarish because every hop potentially changes the IP and port. Kubernetes's flat Pod network eliminates all of that. Every Pod sees every other Pod's real IP. Wireshark on a node captures traffic as-is. Application logs show real source IPs. This simplicity isn't an accident — it's a deliberate design choice that makes the networking layer debuggable by ordinary humans.

The one complexity that remains is at the Service layer — kube-proxy does do DNAT for Service traffic. But this is managed, well-documented, and predictable. And Cilium's eBPF mode eliminates even that complexity for clusters that use it, replacing iptables rules with kernel-level eBPF programs that are faster and easier to inspect with tools like hubble observe.

Practice Questions

1. Which Kubernetes component runs as a DaemonSet on every node and programs iptables (or eBPF) rules to make Service ClusterIPs work — intercepting traffic destined for a virtual IP and redirecting it to real Pod IPs?



2. Traffic is not reaching the Pods behind a Service. You suspect the Service selector doesn't match the Pod labels. What kubectl command shows you the actual Pod IPs that a Service is currently routing to?



3. When a new Pod is created, which component assigns it an IP address from the Pod CIDR, creates its virtual network interface, and sets up routing so other Pods can reach it?



Quiz

1. Service A calls Service B directly by Pod IP instead of using the Service ClusterIP. The next rolling update for Service B breaks Service A's calls. Why?


2. A Service's selector is app: checkout-api but all matching Pods have label app: checkout (missing the "-api"). What does kubectl get endpoints show and what happens to traffic?


3. Which statement best describes the Kubernetes networking model for Pod-to-Pod communication?


Up Next · Lesson 32

ClusterIP, NodePort, LoadBalancer

The three Service types that take traffic from inside the cluster, from external nodes, and from the internet — and when to use each one.