Kubernetes Course
Kubernetes on AWS (EKS)
Amazon Elastic Kubernetes Service manages the Kubernetes control plane for you — API server, etcd, scheduler, and controller-manager are all AWS-managed and highly available. You own the worker nodes, the networking, the IAM configuration, and the add-ons. This lesson covers everything specific to EKS: cluster creation, IAM integration, VPC networking, load balancers, storage, and the AWS-specific patterns you will use every day on EKS.
EKS Architecture Overview
Understanding which parts are AWS-managed and which are your responsibility is the foundation of operating EKS correctly.
AWS Manages (Control Plane)
API server, etcd, scheduler, controller-manager. Multi-AZ HA by default. Automatic patch version upgrades available. etcd backed up automatically by AWS.
You Manage (Data Plane)
Worker nodes (EC2 instances or Fargate), node groups, VPC/subnets, IAM roles, add-ons (CoreDNS, kube-proxy, VPC CNI), storage classes, and your workloads.
Creating an EKS Cluster
The scenario: You are creating a production EKS cluster for the dataplexa payment platform — three availability zones, managed node groups for application workloads, and all the security and networking settings correct from day one.
# cluster.yaml -- eksctl cluster definition
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: production
region: us-east-1
version: "1.30"
# VPC configuration -- eksctl can create a new VPC or use existing subnets
vpc:
cidr: 10.0.0.0/16
clusterEndpoints:
publicAccess: true # kubectl access from the internet (restrict to your IP in prod)
privateAccess: true # Nodes communicate with API server over private network
# CloudWatch logging for the control plane
cloudWatch:
clusterLogging:
enableTypes:
- api # API server audit logs
- audit # Audit log (who did what)
- authenticator
- controllerManager
- scheduler
# Managed node groups
managedNodeGroups:
- name: app-nodes
instanceTypes:
- m5.xlarge
- m5a.xlarge # Multiple types: Karpenter-style flexibility for spot
minSize: 3
maxSize: 20
desiredCapacity: 3
availabilityZones:
- us-east-1a
- us-east-1b
- us-east-1c # Spread across all three AZs
volumeSize: 100
volumeType: gp3
volumeEncrypted: true
amiFamily: AmazonLinux2
iam:
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
# ECR read-only: nodes can pull images without explicit pull secrets
labels:
role: application
tags:
environment: production
team: platform
addons:
- name: vpc-cni
version: latest
attachPolicyARNs:
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- name: coredns
version: latest
- name: kube-proxy
version: latest
- name: aws-ebs-csi-driver # Required for EBS PersistentVolumes
version: latest
wellKnownPolicies:
ebsCSIController: true
# cluster.yaml is a config file passed to eksctl -- not applied with kubectl # After eksctl create cluster -f cluster.yaml completes: $ kubectl get nodes -o wide NAME STATUS VERSION INSTANCE-TYPE ZONE ip-10-0-1-12.compute.internal Ready v1.30.2 m5.xlarge us-east-1a ip-10-0-2-44.compute.internal Ready v1.30.2 m5.xlarge us-east-1b ip-10-0-3-78.compute.internal Ready v1.30.2 m5.xlarge us-east-1c $ kubectl get pods -n kube-system | grep -E "coredns|kube-proxy|vpc" coredns-abc-def 1/1 Running ✓ coredns-xyz-uvw 1/1 Running ✓ kube-proxy-a1b2 1/1 Running ✓ kube-proxy-c3d4 1/1 Running ✓ kube-proxy-e5f6 1/1 Running ✓ aws-node-g7h8 1/1 Running ✓ ← VPC CNI DaemonSet aws-node-i9j0 1/1 Running ✓ aws-node-k1l2 1/1 Running ✓ $ aws eks describe-cluster --name production --query cluster.status "ACTIVE"
# Create the cluster (takes 15-20 minutes)
eksctl create cluster -f cluster.yaml
# Update kubeconfig to connect kubectl to the new cluster
aws eks update-kubeconfig --name production --region us-east-1
# Verify connection
kubectl get nodes -o wide
# NAME STATUS VERSION INSTANCE-TYPE ZONE
# ip-10-0-1-12.compute.internal Ready v1.30.2 m5.xlarge us-east-1a
# ip-10-0-2-44.compute.internal Ready v1.30.2 m5.xlarge us-east-1b
# ip-10-0-3-78.compute.internal Ready v1.30.2 m5.xlarge us-east-1c
$ eksctl create cluster -f cluster.yaml [ℹ] eksctl version 0.175.0 [ℹ] building cluster stack "eksctl-production-cluster" [ℹ] deploying stack "eksctl-production-cluster" [ℹ] building managed nodegroup stack "eksctl-production-nodegroup-app-nodes" [ℹ] deploying stack "eksctl-production-nodegroup-app-nodes" [✔] all EKS cluster resources for "production" have been created $ aws eks update-kubeconfig --name production --region us-east-1 Updated context arn:aws:eks:us-east-1:123456789012:cluster/production in ~/.kube/config $ kubectl get nodes -o wide NAME STATUS VERSION INSTANCE-TYPE ZONE ip-10-0-1-12.compute.internal Ready v1.30.2 m5.xlarge us-east-1a ip-10-0-2-44.compute.internal Ready v1.30.2 m5.xlarge us-east-1b ip-10-0-3-78.compute.internal Ready v1.30.2 m5.xlarge us-east-1c # 3 nodes spread across 3 AZs ✓
IAM Roles for Service Accounts (IRSA)
IRSA is the EKS mechanism for giving Pods access to AWS services (S3, DynamoDB, Secrets Manager, SQS) without static credentials. A ServiceAccount is annotated with an IAM role ARN. When a Pod using that ServiceAccount calls an AWS service, it automatically receives short-lived credentials via the projected token — no access keys anywhere in the cluster.
The scenario: Your payment API needs to read from SQS and write processed results to S3. You create a scoped IAM role and attach it to the payment-api ServiceAccount.
# Step 1: Create an IAM OIDC provider for the cluster (one-time per cluster)
eksctl utils associate-iam-oidc-provider \
--cluster production \
--approve
# Step 2: Create an IAM role with a trust policy for the ServiceAccount
eksctl create iamserviceaccount \
--cluster production \
--namespace payments \
--name payment-api-sa \
--attach-policy-arn arn:aws:iam::123456789012:policy/PaymentAPIPolicy \
--approve
# Creates: IAM role with trust policy for the OIDC provider
# Creates: Kubernetes ServiceAccount annotated with the role ARN
# The trust policy allows the ServiceAccount to assume the role via OIDC
# Verify the annotation was added
kubectl describe sa payment-api-sa -n payments
# Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/...
$ eksctl utils associate-iam-oidc-provider --cluster production --approve
[✔] associated IAM OIDC provider https://oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE
$ eksctl create iamserviceaccount \
--cluster production --namespace payments \
--name payment-api-sa \
--attach-policy-arn arn:aws:iam::123456789012:policy/PaymentAPIPolicy \
--approve
[ℹ] created serviceaccount "payments/payment-api-sa"
[✔] created IAM role "eksctl-production-addon-iamserviceaccount-payments-payment-api-sa-Role"
$ kubectl describe sa payment-api-sa -n payments
Name: payment-api-sa
Namespace: payments
Annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/eksctl-production-...-Role ✓
$ kubectl get sa payment-api-sa -n payments -o jsonpath='{.metadata.annotations}'
{"eks.amazonaws.com/role-arn":"arn:aws:iam::123456789012:role/eksctl-production-...-Role"}# The ServiceAccount eksctl created:
apiVersion: v1
kind: ServiceAccount
metadata:
name: payment-api-sa
namespace: payments
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/payment-api-role
# This annotation is what tells the EKS Pod Identity webhook to inject
# temporary AWS credentials into Pods using this SA
---
# Pod spec -- just reference the ServiceAccount:
spec:
serviceAccountName: payment-api-sa # Credentials auto-injected -- no AWS keys needed
containers:
- name: payment-api
image: registry.company.com/payment-api:3.1.0
# The AWS SDK automatically finds credentials in the projected token volume
# os.environ['AWS_ROLE_ARN'] and AWS_WEB_IDENTITY_TOKEN_FILE are set automatically
$ kubectl describe pod payment-api-7d9f4-xkp2m -n payments | grep -A6 "AWS_"
AWS_ROLE_ARN: arn:aws:iam::123456789012:role/payment-api-role
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
# These env vars are injected automatically by the EKS Pod Identity webhook
$ kubectl exec -it payment-api-7d9f4-xkp2m -n payments -- \
aws sts get-caller-identity
{
"UserId": "AROA...:boto3-session",
"Account": "123456789012",
"Arn": "arn:aws:iam::123456789012:role/payment-api-role"
}
# Pod is authenticated as the IAM role -- no access keys needed ✓What just happened?
IRSA uses OIDC federation — EKS issues OIDC tokens for ServiceAccounts. AWS IAM trusts these tokens via an OIDC identity provider. When the Pod calls an AWS service, the SDK exchanges the OIDC token for short-lived IAM credentials via AWS STS. The credentials expire in 1 hour and are automatically rotated. No secret ever touches your cluster configuration or Git history.
Scope the IAM policy tightly — The PaymentAPIPolicy should grant only the specific actions and resources the payment API needs: sqs:ReceiveMessage on the specific SQS queue ARN, s3:PutObject on the specific S3 bucket and key prefix. Never use Resource: "*" in production IAM policies.
AWS Load Balancer Controller
The AWS Load Balancer Controller translates Kubernetes Service and Ingress objects into AWS load balancers — NLB for Layer 4 (TCP/UDP) Services, ALB for Layer 7 (HTTP/HTTPS) Ingress rules. It replaces the in-tree cloud provider load balancer with a more capable, annotation-driven implementation.
# Install AWS Load Balancer Controller via Helm
helm repo add eks https://aws.github.io/eks-charts
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
--namespace kube-system \
--set clusterName=production \
--set serviceAccount.create=false \
--set serviceAccount.name=aws-load-balancer-controller \
# serviceAccount already created via eksctl with IRSA annotation
$ helm repo add eks https://aws.github.io/eks-charts && helm repo update "eks" has been added to your repositories $ helm install aws-load-balancer-controller eks/aws-load-balancer-controller \ --namespace kube-system \ --set clusterName=production \ --set serviceAccount.create=false \ --set serviceAccount.name=aws-load-balancer-controller NAME: aws-load-balancer-controller STATUS: deployed REVISION: 1 $ kubectl get pods -n kube-system | grep aws-load-balancer aws-load-balancer-controller-6d9f4-abc12 1/1 Running ✓ aws-load-balancer-controller-6d9f4-def34 1/1 Running ✓ ← 2 replicas for HA
# NLB via Service (Layer 4 -- TCP/UDP, preserves client IPs)
apiVersion: v1
kind: Service
metadata:
name: payment-api
namespace: payments
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
type: LoadBalancer
selector:
app: payment-api
ports:
- port: 443
targetPort: 8443
---
# ALB via Ingress (Layer 7 -- HTTP/HTTPS, path-based routing, WAF integration)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: payment-api
namespace: payments
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip # Route directly to Pod IPs
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789012:certificate/abc-def
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-2-2021-06
alb.ingress.kubernetes.io/wafv2-acl-arn: arn:aws:wafv2:us-east-1:123456789012:regional/webacl/...
alb.ingress.kubernetes.io/group.name: production # Share one ALB across multiple Ingress objects
spec:
rules:
- host: api.company.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: payment-api
port:
number: 80
$ kubectl apply -f nlb-service.yaml -f alb-ingress.yaml
service/payment-api created
ingress.networking.k8s.io/payment-api created
# NLB provisioning (takes ~60s):
$ kubectl get service payment-api -n payments
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
payment-api LoadBalancer 10.96.2.15 k8s-payments-payment-abc.elb.us-east-1.amazonaws.com 443:31443/TCP ✓
# ALB provisioning (takes ~90s):
$ kubectl get ingress payment-api -n payments
NAME CLASS HOSTS ADDRESS PORTS
payment-api alb api.company.com k8s-production-xyz.us-east-1.elb.amazonaws.com 80, 443 ✓
# Verify WAF is attached:
$ aws wafv2 list-web-acls --scope REGIONAL --query "WebACLs[?Name=='production-waf']"
[{"Name": "production-waf", "Id": "abc-123", "ARN": "arn:aws:wafv2:..."}]NLB vs ALB — when to use each
NLB (Network Load Balancer) — Use for TCP/UDP traffic, very low latency requirements, or when you need to preserve the client's source IP at Layer 4. Common for: gRPC services, database proxies, gaming, anything that is not HTTP/HTTPS.
ALB (Application Load Balancer) — Use for HTTP/HTTPS traffic. Gives you: host-based and path-based routing, SSL termination with ACM certificates, WAF integration, authentication via Cognito, access logs, and the group.name annotation to share a single ALB across dozens of Ingress objects (saving ALB costs vs one-ALB-per-Ingress).
EBS and EFS Storage
EKS uses CSI (Container Storage Interface) drivers to provision AWS storage. The EBS CSI driver (installed as an add-on above) creates StorageClasses backed by EBS volumes. EFS is used for ReadWriteMany shared storage across multiple Pods.
# StorageClass: EBS gp3 with encryption (use as default)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3-encrypted
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer # Provision in the same AZ as the Pod
reclaimPolicy: Retain # Protect data on PVC deletion
allowVolumeExpansion: true
parameters:
type: gp3
encrypted: "true"
iops: "3000"
throughput: "125"
---
# StorageClass: EFS for ReadWriteMany (shared across multiple Pods)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: efs-shared
provisioner: efs.csi.aws.com
parameters:
provisioningMode: efs-ap # Access point per PVC (isolation)
fileSystemId: fs-0abc123def456789 # Your EFS filesystem ID
directoryPerms: "700"
uid: "1000"
gid: "1000"
# Use case: shared media uploads, ML model weights, shared config files
# accessMode: ReadWriteMany -- multiple Pods can mount simultaneously
# Create a PVC using the gp3-encrypted StorageClass
$ kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
namespace: payments
spec:
storageClassName: gp3-encrypted
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 50Gi
EOF
$ kubectl get pvc postgres-data -n payments
NAME STATUS VOLUME CAPACITY STORAGECLASS
postgres-data Bound pvc-abc123-def456-ghi789 50Gi gp3-encrypted ← provisioned ✓
# EBS volume created in same AZ as the Pod (WaitForFirstConsumer)
$ aws ec2 describe-volumes --filters Name=tag:kubernetes.io/created-for/pvc/name,Values=postgres-data --query 'Volumes[].{ID:VolumeId,AZ:AvailabilityZone,Encrypted:Encrypted}'
[{ "ID": "vol-0abc123", "AZ": "us-east-1a", "Encrypted": true }] ✓Essential EKS Operational Patterns
# Switch between multiple clusters / environments
aws eks update-kubeconfig --name staging --region us-east-1 --alias staging
aws eks update-kubeconfig --name production --region us-east-1 --alias production
kubectl config use-context production
kubectl config get-contexts # List all configured clusters
# Check which IAM identity kubectl is using
aws sts get-caller-identity # Shows which IAM user/role is making API calls
kubectl auth whoami # Shows the Kubernetes user derived from the IAM identity
# Restrict API server access to your company IP range (in cluster.yaml)
# vpc.clusterEndpoints.publicAccessCIDRs: ["203.0.113.0/24"]
# View EKS add-on status
aws eks describe-addon --cluster-name production --addon-name vpc-cni \
--query addon.status
# Force node group to use latest AMI (security patches)
aws eks update-nodegroup-version \
--cluster-name production \
--nodegroup-name app-nodes \
--force # Drain and replace nodes even with PDB violations (use carefully)
# Get all resources in a namespace with their resource consumption
kubectl get pods -n payments -o custom-columns=\
"NAME:.metadata.name,CPU_REQ:.spec.containers[*].resources.requests.cpu,MEM_REQ:.spec.containers[*].resources.requests.memory"
$ aws eks update-kubeconfig --name staging --region us-east-1 --alias staging
Updated context staging in ~/.kube/config
$ kubectl config use-context production
Switched to context "production".
$ kubectl config get-contexts
CURRENT NAME CLUSTER
staging arn:aws:eks:us-east-1:123456789012:cluster/staging
* production arn:aws:eks:us-east-1:123456789012:cluster/production
$ kubectl auth whoami
ATTRIBUTE VALUE
Username arn:aws:iam::123456789012:assumed-role/PlatformEngineerRole/alice
Groups [system:authenticated]
$ aws sts get-caller-identity
{
"UserId": "AROA...:alice",
"Account": "123456789012",
"Arn": "arn:aws:iam::123456789012:assumed-role/PlatformEngineerRole/alice"
}Teacher's Note: EKS-specific gotchas every operator encounters
aws-auth ConfigMap — EKS maps IAM identities to Kubernetes RBAC users via a ConfigMap called aws-auth in kube-system. If you accidentally delete this ConfigMap, you lose all IAM-based access to the cluster — including your own. Always edit it with eksctl create/delete iamidentitymapping rather than directly with kubectl edit. The IAM user who created the cluster always retains access via a separate mechanism as a safety net.
VPC CNI and IP exhaustion — The AWS VPC CNI assigns real VPC IPs to every Pod. A subnet with /24 gives you 251 usable IPs — that means roughly 60-80 Pods per node depending on instance type. With a busy cluster, you can run out of subnet IPs. Plan your VPC CIDR and subnet sizing before creating the cluster — changing subnets later is extremely painful. Use /22 or larger subnets for production node groups.
EKS vs self-managed cost — EKS charges $0.10/hour per cluster (~$73/month) regardless of node count, plus EC2 costs for worker nodes. For small clusters this is the dominant cost — if you are running more than 3 small clusters (e.g. one per microservice team), consider consolidating into fewer larger clusters with namespace isolation. The per-cluster control plane cost adds up quickly.
Practice Questions
1. What is the name of the EKS mechanism that gives Pods access to AWS services using an IAM role annotation on a ServiceAccount — with no static credentials stored in the cluster?
2. Which StorageClass volumeBindingMode ensures an EBS volume is provisioned in the same Availability Zone as the Pod that claims it?
3. Which annotation allows multiple Kubernetes Ingress objects to share a single AWS Application Load Balancer — reducing ALB costs in a cluster with many services?
Quiz
1. How does IRSA give a Pod AWS credentials without storing any access keys in Kubernetes Secrets?
2. Your payment API serves HTTPS traffic and needs WAF protection and SSL certificate management via ACM. Your internal gRPC service needs to preserve client IPs. Which load balancer type should each use?
3. You are scaling a busy EKS cluster and new Pods are failing to start with "failed to allocate for range 0: no IP addresses available". What is the root cause and how should you have avoided it?
Up Next · Lesson 60
Mini Project: Production-Ready Cluster
The final lesson brings everything together — you will deploy a complete, production-ready application on EKS incorporating RBAC, Network Policies, TLS, HPA, Cluster Autoscaling, logging, monitoring, and GitOps. The capstone of the entire course.