Docker Lesson 44 – Docker on AWS | Dataplexa

Section IV · Lesson 44

Docker on AWS

A team running Docker on a single EC2 instance outgrew it. The instance was at 90% CPU during peak hours, deploys caused thirty-second outages while the old container stopped and the new one started, and a single host failure took down the entire product. They knew they needed to scale — but the AWS console presented four different services that all seemed to do something with containers: ECS, EKS, Fargate, and App Runner. Each had its own console, its own pricing model, and its own documentation written as if the others didn't exist. The question wasn't how to use Docker. They already knew that. The question was which AWS service maps to what they already know — and why.

This lesson maps every Docker concept from this course to its AWS equivalent. The image is the same. The Dockerfile is the same. The Compose mental model carries over directly. What changes is where the containers run, who manages the host, and how traffic is routed to them. By the end of this lesson, each AWS service will have a clear role and a clear reason to choose it over the others.

The Airport Analogy

Running Docker on a single EC2 instance is like owning a private airstrip — you control everything, you maintain everything, and when the runway needs resurfacing, nothing flies. AWS container services are the commercial airport system. ECR is the baggage handling facility — it stores and retrieves your containers (images) reliably. ECS is the air traffic control tower — it decides which runway (EC2 instance) each flight (container) uses and reroutes when a runway is closed. Fargate is the airport that doesn't show you the runways at all — you tell it what the flight needs (CPU, memory) and it handles the rest. App Runner is the charter flight service — hand it a container image, answer three questions, and it handles ticketing, boarding, and landing automatically. Same planes. Same destinations. Completely different levels of involvement required from you.

Docker Concepts → AWS Services

Everything you already know — mapped to AWS

Docker concept AWS equivalent What it does

Docker Hub / private registry ECR Stores and serves container images — private, in the same region as your workloads

docker run (container) ECS Task A single running container definition — CPU, memory, image, environment, ports

docker-compose.yml service ECS Service Keeps N tasks running, handles rolling deploys, connects to a load balancer

Docker host (the server) ECS Cluster The pool of compute (EC2 or Fargate) where tasks are scheduled and run

docker network VPC / Security Groups Network isolation — which tasks can communicate, which ports are open

Named volume EFS / EBS Persistent storage mounted into containers — EFS for shared, EBS for single-task

docker stats CloudWatch Container Insights CPU, memory, and network metrics per task — with alerting and dashboards

docker logs CloudWatch Logs Container stdout/stderr shipped to CloudWatch — searchable, retained, alertable

Docker secrets AWS Secrets Manager / SSM Secrets injected into containers at runtime — encrypted, audited, rotatable

ECR — Elastic Container Registry

ECR is Docker Hub for AWS — a private container registry that lives in your AWS account, in your region, next to your workloads. Images pulled from ECR never leave AWS's network — no egress charges, no public internet hop, and authentication is handled by IAM rather than a separate username and password. Every ECS task, Fargate workload, and App Runner service pulls from ECR automatically using the task's IAM role.

# Create a private ECR repository:
aws ecr create-repository \
  --repository-name acmecorp/payment-api \
  --region ap-south-1 \
  --image-scanning-configuration scanOnPush=true \
  --encryption-configuration encryptionType=AES256
# scanOnPush=true  → ECR runs a vulnerability scan automatically on every push
#                    findings visible in the ECR console — same CVE data as Trivy
# encryptionType   → images encrypted at rest with AES-256

# Authenticate Docker to ECR — uses your AWS credentials, no separate password:
aws ecr get-login-password --region ap-south-1 | \
  docker login \
  --username AWS \
  --password-stdin \
  123456789012.dkr.ecr.ap-south-1.amazonaws.com
# The token is valid for 12 hours.
# In CI: use the aws-actions/amazon-ecr-login GitHub Action instead.

# Tag and push to ECR — same docker push, different registry URL:
GIT_SHA=$(git rev-parse --short HEAD)
ECR_URI=123456789012.dkr.ecr.ap-south-1.amazonaws.com/acmecorp/payment-api

docker tag payment-api:${GIT_SHA} ${ECR_URI}:${GIT_SHA}
docker tag payment-api:${GIT_SHA} ${ECR_URI}:latest
docker push ${ECR_URI}:${GIT_SHA}
docker push ${ECR_URI}:latest

# In your CI/CD pipeline — GitHub Actions:
- uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::123456789012:role/github-actions-role
    aws-region: ap-south-1
    # OIDC-based auth — no long-lived AWS keys stored in GitHub secrets.

- uses: aws-actions/amazon-ecr-login@v2
  id: login-ecr

- name: Build and push to ECR
  run: |
    docker build --target production \
      -t ${{ steps.login-ecr.outputs.registry }}/acmecorp/payment-api:${GITHUB_SHA::7} .
    docker push ${{ steps.login-ecr.outputs.registry }}/acmecorp/payment-api:${GITHUB_SHA::7}

# Successful push to ECR:
The push refers to repository [123456789012.dkr.ecr.ap-south-1.amazonaws.com/acmecorp/payment-api]
3a7f2c9e1b4d: Pushed
8b1c4e7a9d2f: Layer already exists
a3f2c8d91e44: Layer already exists
a3f2c8d: digest: sha256:9c1e3a5b7d...

# ECR scan results (triggered automatically by scanOnPush):
aws ecr describe-image-scan-findings \
  --repository-name acmecorp/payment-api \
  --image-id imageTag=a3f2c8d \
  --region ap-south-1

imageScanFindings:
  findings: []
  findingSeverityCounts: {}
imageScanStatus:
  status: COMPLETE
# No findings. Image is clean. Safe to deploy.

What just happened?

The image was pushed to a private ECR repository using IAM-based authentication — no separate registry credentials to manage, rotate, or accidentally expose. ECR triggered an automatic vulnerability scan on push and reported zero findings. The image is now available to any ECS task, Fargate workload, or App Runner service in the same AWS account, pulled over the internal network with no egress costs and no public internet exposure. Every layer that already existed in ECR was skipped — same layer caching behaviour as Docker Hub, but within your own account.

ECS with Fargate — Running Containers Without Managing Servers

Fargate is the serverless compute layer for ECS — you define what the container needs (CPU and memory), and AWS provisions, manages, and patches the underlying host. No EC2 instances to size, no AMIs to maintain, no SSH access to a server that runs your containers. A Fargate task definition is the AWS equivalent of a docker run command — image, CPU, memory, environment variables, ports, and logging configuration.

# ECS Task Definition — the AWS equivalent of docker run:
aws ecs register-task-definition \
  --family payment-api \
  --requires-compatibilities FARGATE \
  --network-mode awsvpc \
  --cpu 512 \
  --memory 1024 \
  --execution-role-arn arn:aws:iam::123456789012:role/ecsTaskExecutionRole \
  --task-role-arn arn:aws:iam::123456789012:role/payment-api-task-role \
  --container-definitions '[
    {
      "name": "payment-api",
      "image": "123456789012.dkr.ecr.ap-south-1.amazonaws.com/acmecorp/payment-api:a3f2c8d",
      "portMappings": [{"containerPort": 3000, "protocol": "tcp"}],
      "environment": [
        {"name": "NODE_ENV", "value": "production"},
        {"name": "DB_HOST",  "value": "payment-db.cluster-xyz.ap-south-1.rds.amazonaws.com"}
      ],
      "secrets": [
        {
          "name": "DB_PASSWORD",
          "valueFrom": "arn:aws:secretsmanager:ap-south-1:123456789012:secret:payment-db-password"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/payment-api",
          "awslogs-region": "ap-south-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "wget -qO- http://localhost:3000/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 10
      }
    }
  ]'
# cpu 512        → 0.5 vCPU — scales independently per task
# memory 1024    → 1 GB RAM
# secrets        → DB_PASSWORD injected from Secrets Manager at runtime
#                  the task role grants permission to read the secret
#                  the value never touches the task definition itself
# logConfiguration → stdout/stderr shipped to CloudWatch Logs automatically

# Create an ECS Service — the AWS equivalent of docker-compose service:
aws ecs create-service \
  --cluster production \
  --service-name payment-api \
  --task-definition payment-api:3 \
  --desired-count 2 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-abc123,subnet-def456],
    securityGroups=[sg-payment-api],
    assignPublicIp=DISABLED
  }" \
  --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:...,
                   containerName=payment-api,containerPort=3000" \
  --deployment-configuration "minimumHealthyPercent=100,maximumPercent=200"
# desired-count 2      → always keep 2 tasks running — one per availability zone
# minimumHealthyPercent=100 → during deploy, keep all current tasks running
# maximumPercent=200        → allow up to 4 tasks during the deploy window
#                             new tasks start, pass health check, then old tasks stop
#                             this is a rolling deploy — zero downtime

# Monitor the rolling deploy:
aws ecs describe-services \
  --cluster production \
  --services payment-api \
  --query 'services[0].deployments'

[
  {
    "status": "PRIMARY",
    "taskDefinition": "payment-api:4",   ← new version
    "runningCount": 1,
    "pendingCount": 1,
    "desiredCount": 2
  },
  {
    "status": "ACTIVE",
    "taskDefinition": "payment-api:3",   ← old version — still running
    "runningCount": 1,
    "pendingCount": 0,
    "desiredCount": 0
  }
]
# Two deployments active simultaneously.
# New task is starting (pendingCount: 1) while old task serves traffic.
# Once new task passes health check, old task is stopped.
# At no point are zero tasks running — zero-downtime rolling deploy.

What just happened?

ECS launched a rolling deploy — the new task definition started alongside the old one, waited for the health check to pass, then drained and stopped the old task. At no point did the service drop below one running task. The load balancer continued routing traffic to the old task until the new one was confirmed healthy. The DB password was injected from Secrets Manager at task startup — it never appeared in the task definition, never in environment variable lists visible in the console, and never in any log output. This is the production-grade version of the runtime secret injection pattern from Lesson 33.

App Runner — From Image to URL in Minutes

App Runner is the simplest AWS container service — point it at an ECR image, set the port and environment variables, and AWS handles load balancing, TLS termination, auto-scaling, and health checks automatically. There is no task definition, no cluster, no service configuration, no VPC setup required to get started. It is the right choice when the goal is a running HTTPS endpoint from a container image with the minimum possible configuration.

# Create an App Runner service — the simplest path from image to HTTPS URL:
aws apprunner create-service \
  --service-name payment-api \
  --source-configuration '{
    "ImageRepository": {
      "ImageIdentifier": "123456789012.dkr.ecr.ap-south-1.amazonaws.com/acmecorp/payment-api:a3f2c8d",
      "ImageRepositoryType": "ECR",
      "ImageConfiguration": {
        "Port": "3000",
        "RuntimeEnvironmentVariables": {
          "NODE_ENV": "production"
        },
        "RuntimeEnvironmentSecrets": {
          "DB_PASSWORD": "arn:aws:secretsmanager:ap-south-1:123456789012:secret:payment-db-password"
        }
      }
    },
    "AutoDeploymentsEnabled": true
    # AutoDeploymentsEnabled → App Runner watches the ECR tag and redeploys
    # automatically when a new image is pushed. Zero pipeline deploy step needed.
  }' \
  --instance-configuration '{
    "Cpu": "1 vCPU",
    "Memory": "2 GB"
  }' \
  --health-check-configuration '{
    "Protocol": "HTTP",
    "Path": "/health",
    "Interval": 10,
    "Timeout": 5,
    "HealthyThreshold": 2,
    "UnhealthyThreshold": 3
  }'

# App Runner provisions and starts the service:
{
  "Service": {
    "ServiceName": "payment-api",
    "Status": "OPERATION_IN_PROGRESS",
    "ServiceUrl": "abc123xyz.ap-south-1.awsapprunner.com"
  }
}

# 90 seconds later:
{
  "Service": {
    "ServiceName": "payment-api",
    "Status": "RUNNING",
    "ServiceUrl": "abc123xyz.ap-south-1.awsapprunner.com"
  }
}

# Test the service:
curl https://abc123xyz.ap-south-1.awsapprunner.com/health
{"status":"healthy","version":"a3f2c8d"}

# App Runner handles:
# ✓ Load balancer — automatic
# ✓ TLS certificate — automatic (HTTPS by default)
# ✓ Auto-scaling — 0 to N instances based on request count
# ✓ Health checks — restarts unhealthy instances automatically
# ✓ Rolling deploys — when a new image is pushed to ECR
# Configuration required from you: 4 fields. Setup time: 90 seconds.

Choosing the Right AWS Container Service

Which service — and when

App Runner — choose when simplicity is the priority

You have a stateless web service or API. You want HTTPS running in under five minutes. You don't need VPC integration, custom networking, or fine-grained IAM per task. Typical uses: internal tools, prototypes, simple APIs, background workers.

ECS + Fargate — choose when you need control without servers

You need VPC placement, fine-grained security groups, sidecar containers, custom networking, or service-to-service communication within a private network. You want rolling deploys, auto-scaling, and load balancer integration — but no EC2 instances to maintain. Typical uses: production microservices, APIs with database access, multi-container workloads.

ECS + EC2 — choose when you need the host

You need GPU instances, specific instance families, Spot instances for cost reduction, or tasks that require more than 16 vCPU or 120 GB memory (Fargate limits). You're comfortable managing EC2 instances and want the cost savings of packing many containers onto shared hosts. Typical uses: ML inference, batch processing, cost-sensitive high-throughput workloads.

EKS — choose when you need Kubernetes

Your team already runs Kubernetes, you need Helm charts, Custom Resource Definitions, or the Kubernetes ecosystem of operators and tooling. EKS is managed Kubernetes — AWS runs the control plane, you run the worker nodes. Typical uses: organisations already on Kubernetes, workloads with complex scheduling requirements, multi-cloud portability requirements.

Deploying to ECS from the CI Pipeline

The scenario: The pipeline from Lesson 43 pushes a verified image to ECR. The deploy stage now updates the ECS service to use the new task definition — triggering a rolling deploy with zero downtime, without touching any server directly.

# .github/workflows/ci.yml — deploy stage updated for ECS:
  deploy:
    name: Deploy to ECS
    runs-on: ubuntu-latest
    needs: push
    if: github.ref == 'refs/heads/main'
    environment: production

    steps:
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions-role
          aws-region: ap-south-1

      - name: Download task definition
        run: |
          aws ecs describe-task-definition \
            --task-definition payment-api \
            --query taskDefinition \
            > task-definition.json

      - name: Update image in task definition
        id: task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: task-definition.json
          container-name: payment-api
          image: 123456789012.dkr.ecr.ap-south-1.amazonaws.com/acmecorp/payment-api:${{ github.sha }}
          # Replaces the image field in the downloaded task definition
          # with the new SHA — creates a new task definition revision.

      - name: Deploy to ECS service
        uses: aws-actions/amazon-ecs-deploy-task-definition@v1
        with:
          task-definition: ${{ steps.task-def.outputs.task-definition }}
          service: payment-api
          cluster: production
          wait-for-service-stability: true
          # wait-for-service-stability: true → the pipeline step blocks until
          # ECS confirms the new tasks are healthy and the old tasks are stopped.
          # If the new task fails its health check, ECS rolls back automatically
          # and the pipeline step fails — alerting the team immediately.

# GitHub Actions output for the deploy step:
Deploying task definition revision 4 to service payment-api in cluster production.
Waiting for service payment-api to reach a stable state...

ECS task arn:aws:ecs:ap-south-1:123456789012:task/production/abc123 is PENDING
ECS task arn:aws:ecs:ap-south-1:123456789012:task/production/abc123 is RUNNING
ECS task arn:aws:ecs:ap-south-1:123456789012:task/production/def456 is DEACTIVATING
ECS task arn:aws:ecs:ap-south-1:123456789012:task/production/def456 is STOPPED

Service reached a stable state.
Deployment complete.

# Total pipeline: git push → deployed to production ECS in 3m 12s.
# Zero EC2 SSH access. Zero server management. Zero manual steps.
# Rollback: update the ECS service to the previous task definition revision.
# aws ecs update-service --cluster production --service payment-api \
#   --task-definition payment-api:3   ← previous revision

Never Store AWS Credentials as GitHub Secrets

Long-lived AWS access keys stored as GitHub secrets are a significant security risk — they do not expire, they appear in plaintext if accidentally logged, and rotating them requires updating every repository that uses them. Use OIDC federation instead: configure a trust relationship between GitHub Actions and an IAM role. The pipeline assumes the role using a short-lived token that GitHub generates per run — no long-lived credentials anywhere. The aws-actions/configure-aws-credentials action handles this with role-to-assume.

Teacher's Note

Start with App Runner if you've never deployed containers to AWS before — it removes every variable except your container image and lets you focus on getting the application working in the cloud. Once it's running, you'll understand which ECS features you actually need. Move to ECS + Fargate when you need VPC placement, private networking between services, or sidecar containers. Move to ECS + EC2 when cost at scale becomes the constraint. The Dockerfile is identical across all three — the investment in building a good image pays dividends at every layer of the stack.

Practice Questions

1. The AWS service that stores private container images — equivalent to Docker Hub — within your AWS account and region, authenticated via IAM rather than a separate username and password, is called what?

2. The ECS launch type that runs containers without requiring you to provision or manage EC2 instances — you specify CPU and memory, and AWS handles the underlying host — is called what?

3. In the aws-actions/amazon-ecs-deploy-task-definition GitHub Action, which option causes the pipeline step to block until ECS confirms the new tasks are healthy — and fail the step if ECS rolls back due to a failed health check?

Quiz

Up Next · Lesson 45

Mini Project

AWS covered — now you build it. The mini project brings together every concept from the course: a multi-service application with a Node.js API, a Postgres database, and a Redis cache — Dockerized, secured, optimized, and deployed through a complete CI/CD pipeline to production.

← Previous Course Index Next →

Docker Course