Kubernetes Course
Kubernetes Security Best Practices
Security in Kubernetes is not a single setting — it is a layered posture built from a dozen interlocking controls. This lesson consolidates everything covered in Section III into an actionable checklist that a security team can use to audit a cluster, and a platform team can use to harden one from scratch.
The Security Threat Model
Before hardening anything, it helps to know what you're defending against. In a Kubernetes environment, the most common security incidents fall into four categories:
Container Escape
A compromised container breaks out to the host. Mitigated by: non-root, read-only filesystem, dropped capabilities, seccomp, no hostPID/hostNetwork.
Lateral Movement
Attacker pivots from one compromised service to others. Mitigated by: Network Policies (default-deny), minimal Service Account permissions, no overly broad RBAC.
Credential Theft
API tokens, Secrets, or cloud credentials exfiltrated. Mitigated by: external secrets manager, etcd encryption, RBAC on Secrets, short-lived tokens.
Supply Chain
Malicious or vulnerable images deployed. Mitigated by: private registry policy, image scanning in CI, pinned digests, admission policy blocking unapproved registries.
The Production Security Checklist
Each item below links back to the lesson that covers it in depth. Use this as an audit checklist — work through each control and mark it done once it is implemented and verified.
1. RBAC — Minimum Necessary Access Lessons 37–39
No wildcard verbs or resources in Roles. No cluster-admin bindings for application workloads. ServiceAccounts dedicated per workload with automountServiceAccountToken: false where no API access is needed. Audit cluster-admin bindings quarterly.
kubectl get clusterrolebindings -o json | \
jq '.items[] | select(.roleRef.name=="cluster-admin") | .metadata.name'2. Pod Security — Hardened Security Contexts Lesson 40
Every container sets runAsNonRoot: true, allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, capabilities.drop: [ALL], and seccompProfile: RuntimeDefault.
3. Pod Security Admission — Cluster-Wide Enforcement Lesson 41
All application namespaces labelled with at minimum enforce=baseline. Production namespaces with sensitive data labelled enforce=restricted. System namespaces (kube-system) labelled privileged.
kubectl get namespaces -o json | \
jq '.items[] | select(.metadata.labels["pod-security.kubernetes.io/enforce"] == null) | .metadata.name'
# Lists namespaces with no PSA enforcement — these need attention4. Network Policies — Default Deny Lesson 36
Every application namespace has a default-deny-all NetworkPolicy. Explicit ingress/egress rules open only the ports and peers each workload needs. DNS egress (port 53 UDP+TCP) explicitly allowed.
5. Secrets Management — No Plaintext in etcd Lesson 42
etcd encryption at rest enabled (KMS preferred). External Secrets Operator syncing from AWS SM / Vault. No secret values committed to Git. Pre-commit hooks scanning for credentials.
6. TLS Everywhere Lesson 44
All external endpoints behind HTTPS Ingress with valid TLS certificates. cert-manager handling automated renewal. No HTTP-only public endpoints. TLS 1.2 minimum enforced at Ingress controller.
7. Image Security — Trusted Supply Chain Lessons 41, 43
All images pulled from an internal registry. Images scanned for CVEs in CI before promotion. No :latest tags in production — use immutable digests or pinned semantic version tags. Kyverno/OPA enforcing registry policy.
kubectl get pods -A -o json | \
jq '.items[].spec.containers[].image | select(test(":latest$") or test("^[^:]+$"))'
# Lists Pods using :latest or untagged images8. API Server Access — No Broad Public Exposure
API server not accessible from the public internet (use a VPN or bastion). kubeconfig files not shared — individual user certificates or OIDC-based authentication. API server audit logging enabled and shipped off-cluster.
9. Node Security — Minimal Attack Surface
Nodes run only the kubelet and container runtime — no unnecessary software. Node OS patched on a defined schedule. No SSH keys on nodes (use SSM Session Manager or equivalent). Nodes not accessible from Pod network (host ports avoided unless necessary).
Scoring Your Cluster: The Quick Audit
Run these four commands to get an immediate picture of the most impactful security gaps in any cluster:
# 1. Find Pods running as root
kubectl get pods -A -o json | jq -r '
.items[] |
select(
(.spec.securityContext.runAsNonRoot != true) and
(.spec.containers[].securityContext.runAsNonRoot != true)
) |
"\(.metadata.namespace)/\(.metadata.name)"'
# 2. Find Pods with no resource limits set
kubectl get pods -A -o json | jq -r '
.items[] |
select(.spec.containers[].resources.limits == null) |
"\(.metadata.namespace)/\(.metadata.name)"'
# 3. Find ServiceAccounts with cluster-admin binding
kubectl get clusterrolebindings -o json | jq -r '
.items[] |
select(.roleRef.name == "cluster-admin") |
"CRB: \(.metadata.name) → \([.subjects[]? | "\(.kind)/\(.name)"] | join(", "))"'
# 4. Find namespaces with no default-deny NetworkPolicy
kubectl get networkpolicy -A -o json | jq -r '
[.items[] | select(
.spec.podSelector == {} and
(.spec.policyTypes | sort) == ["Egress","Ingress"]
) | .metadata.namespace] as $secured |
[.items[].metadata.namespace] | unique |
map(select(. as $ns | $secured | index($ns) | not)) | .[]'
# Command 1 — Pods running as root: kube-system/coredns monitoring/prometheus-server ← legitimate (needs root for port 9090), document this payments/legacy-importer ← needs fixing # Command 2 — Pods with no resource limits: payments/legacy-importer staging/test-runner # Command 3 — cluster-admin bindings: CRB: cluster-admin → Group/system:masters ← built-in, expected CRB: ci-pipeline-admin → ServiceAccount/ci-pipeline-sa ← INVESTIGATE: CI should not be cluster-admin # Command 4 — namespaces without default-deny NetworkPolicy: staging analytics # These namespaces have no default-deny policy — lateral movement is possible
Teacher's Note: Security is a journey, not a destination
No cluster starts fully hardened. The practical approach is to prioritise by impact: start with items 1 and 4 (RBAC and default-deny network policies) — these have the highest impact on blast radius reduction. Then add PSA enforcement (item 3) and secrets management (item 5). TLS (item 6) and image security (item 7) follow once the core controls are stable.
Run the quick audit every quarter and after any major infrastructure change. Consider tools like kube-bench (CIS Kubernetes Benchmark automated checks), kubescape (NSA/CISA framework compliance), and trivy operator (continuous image vulnerability scanning in-cluster) to automate posture monitoring over time.
Practice Questions
1. Which Kubernetes resource, when set to default-deny-all in a namespace, prevents lateral movement between compromised workloads even if RBAC controls are bypassed?
2. A Pod running as root with privileged: true is particularly dangerous because it enables which category of attack from the threat model above?
3. Which open-source tool runs automated CIS Kubernetes Benchmark checks against a cluster's configuration?
Quiz
1. A container in the payments namespace is compromised. The namespace has a default-deny-all NetworkPolicy. Which threat does this primarily mitigate?
2. The quick audit shows your CI pipeline ServiceAccount has a cluster-admin ClusterRoleBinding. Why is this a security risk, and what should you do?
3. You're hardening a new cluster. The checklist has 9 items. What is the recommended priority order to start with?
Up Next · Lesson 46
Kubernetes Scheduling
Section IV begins. The kube-scheduler decides which node each Pod runs on. This lesson covers how scheduling works, node selectors, resource-based placement, and how to influence scheduling decisions for your workloads.