Docker Lesson 28 – Docker Registry Concepts | Dataplexa

Section III · Lesson 28

Docker Registry Concepts

Every image you've ever pulled came from a registry. Every image you build eventually needs to go to one. A registry is the bridge between your build machine and every server that ever runs your software — and understanding how it works is what makes CI/CD pipelines, multi-team deployments, and production releases make sense.

You've been using Docker Hub as a registry since Lesson 1 without thinking about it. This lesson opens up what's actually happening — how images are stored, how they're addressed, how layers are deduplicated across pulls, and when Docker Hub isn't the right choice.

A Registry Is a Specialised Content Store

A Docker registry is a server that stores and serves Docker images. It accepts docker push to receive images and serves docker pull to distribute them. Under the hood, it stores images as a collection of layers — each layer is a compressed blob identified by its SHA256 hash. Because layers are content-addressed, the same layer is stored exactly once regardless of how many images reference it.

The Public Library Analogy

A Docker registry is like a public library — except instead of books, it stores image layers. Every layer has a unique catalogue number (its SHA256 hash). When a patron (the Docker Daemon) requests a book, the library checks if the patron already has a copy. If they do, the library says "you already have that one" and skips it. If they don't, the library sends only the chapters (layers) they're missing. A checkout from a familiar author (related images) is always faster than discovering a new one — because the shared chapters are already home on the shelf.

The Full Image Reference — Decoded

Every image has a fully qualified address. Most of it gets omitted because Docker fills in defaults, but understanding the full format is essential when you start working with private registries, multiple registries, or images in CI/CD pipelines.

Anatomy of a full image reference

registry.acmecorp.com

/backend

/payment-api

:v2.3.1

registry host

namespace

image name

tag

Docker Hub shorthand — what Docker fills in automatically

nginx → docker.io/library/nginx:latest

acmecorp/api → docker.io/acmecorp/api:latest

acmecorp/api:v1.0 → docker.io/acmecorp/api:v1.0

Public vs Private Registries

Not every image should be publicly accessible. Your internal microservices contain proprietary business logic. Your images might include compiled code that reveals architecture decisions you don't want competitors to see. Or you simply need tighter access control over which images can be deployed to production. That's when you move to a private registry.

Registry options — public and private

Docker Hub

The default public registry. Free for public images. Rate-limited pulls for unauthenticated users (100 pulls/6h). Paid plans for private repos and higher limits. Best for open-source projects and public official images.

AWS ECR

Amazon Elastic Container Registry. Private by default. Integrates natively with ECS, EKS, and IAM. No rate limits. You pay per GB stored and per GB transferred. The standard choice for AWS-hosted workloads.

GCP Artifact Registry

Google's registry — supports Docker images and other artifact types. Tight GKE integration. Regional storage for low-latency pulls in GCP regions.

GitHub Container Registry

Hosted at ghcr.io. Integrates with GitHub Actions workflows. Free for public images. Images live alongside your source code. Ideal when your CI/CD already runs on GitHub.

Self-hosted

Run your own registry using the open-source registry:2 image or Harbor. Full control, no vendor dependency, no per-GB cost. Requires you to handle storage, security, TLS, and backups yourself.

How a Push and Pull Actually Works

The scenario: Your CI/CD pipeline just finished building a new version of the payment API. It needs to push the image to ECR so production servers can pull and deploy it. This is the exact sequence of events behind every deployment at every company using Docker.

# Step 1 — Authenticate with the registry
# Docker Hub
docker login
# Enter Docker Hub username and password/access token

# AWS ECR (uses AWS CLI to generate a temporary token)
aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com
# The token is valid for 12 hours — CI/CD pipelines re-authenticate before each push

# Step 2 — Tag the image with the full registry path
docker tag payment-api:v2.3.1 \
  123456789.dkr.ecr.us-east-1.amazonaws.com/backend/payment-api:v2.3.1
# docker tag SOURCE_IMAGE TARGET_IMAGE
# This does not copy the image — it adds an additional name that points to the same layers
# The registry address is the first part of the target name — Docker knows where to push it

# Step 3 — Push to the registry
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/backend/payment-api:v2.3.1

The push refers to repository [123456789.dkr.ecr.us-east-1.amazonaws.com/backend/payment-api]
3a7f2c9e1b4d: Pushing  2.048kB/4.096kB
8b1c4e7a9d2f: Layer already exists
5d3a1b9c7e4f: Layer already exists
a3b7c9d1e5f2: Pushing  14.34MB/28.67MB
6f8a2c4e7b1d: Layer already exists
v2.3.1: digest: sha256:7b9c3e1f5a8d2b6e4c7f9a1d3b5e7c9f1a3d5b7e9c1f3a5d7b9e1c3f5a7d9b1 size: 1573

What just happened?

Docker pushed the image layer by layer. Three of the five layers show Layer already exists — the registry already had those layers from a previous push of an earlier version of this image. Only two layers actually transferred — the base OS layer and the changed application layer. This is the registry's deduplication working: the Alpine base, the Node runtime, and the npm dependencies are shared with every other version of this image already in the registry. Only the changed source code layer (14–28 MB) actually moved over the wire. The final line is the image digest — an immutable content hash that identifies this exact image build forever, regardless of what tags point to it.

Pull — The Server Side of the Equation

# Pull from a private registry (must be authenticated first)
docker pull 123456789.dkr.ecr.us-east-1.amazonaws.com/backend/payment-api:v2.3.1

# Pull by digest instead of tag — immutable, guaranteed exact version
docker pull 123456789.dkr.ecr.us-east-1.amazonaws.com/backend/payment-api@sha256:7b9c3e1f5a8d...
# A tag like :v2.3.1 can be reassigned to a different image
# A digest reference is permanently tied to one specific image build — it never changes

# Check what's cached locally
docker images 123456789.dkr.ecr.us-east-1.amazonaws.com/backend/payment-api

v2.3.1: Pulling from backend/payment-api
3a7f2c9e1b4d: Already exists
8b1c4e7a9d2f: Already exists
5d3a1b9c7e4f: Already exists
a3b7c9d1e5f2: Pull complete
6f8a2c4e7b1d: Already exists
Digest: sha256:7b9c3e1f5a8d2b6e4c7f9a1d3b5e7c9f1a3d5b7e9c1f3a5d7b9e1c3f5a7d9b1
Status: Downloaded newer image for payment-api:v2.3.1

REPOSITORY                                                    TAG      IMAGE ID       SIZE
123456789.dkr.ecr.us-east-1.amazonaws.com/backend/payment-api v2.3.1  b7e1a4c52f88   167MB

What just happened?

The production server already had most of the layers cached from the previous deployment. Only one layer — the changed application code — actually downloaded. This is why Docker deployments are fast even for large images: the base OS, runtime, and dependencies almost never change between versions. Only the delta transfers. For a 167 MB total image, you might transfer only 5–10 MB per deployment because everything else is already cached on the production server from the last push.

The full push → pull deployment flow

Developer / CI
docker build + tag

→

docker push

Registry
ECR / Hub / ghcr.io

stores layers
deduplicates blobs

→

docker pull

Production server
docker run

only new layers
downloaded

Build once. Push to registry. Pull anywhere. Only changed layers transfer. This is the universal Docker deployment model.

Docker Hub Rate Limits Will Break Your CI

Unauthenticated Docker Hub pulls are limited to 100 per 6 hours per IP. In a shared CI environment — a GitHub Actions runner, a Jenkins server — dozens of pipelines share the same IP. You'll hit the limit fast. Fix: always authenticate your CI runners to Docker Hub with a service account, or use a private registry or a pull-through cache for base images. A pipeline that fails because Docker Hub rate-limited you at 2am is a completely avoidable emergency.

Teacher's Note

Use digest references (@sha256:...) in production deployments, not tags. A tag is a mutable pointer — someone can reassign :latest or even :v2.3.1 to a different image. A digest is immutable. Pin your production deployments to digests and you eliminate an entire category of "it was working yesterday" mysteries.

Practice Questions

1. Before pushing an image to a registry, you must assign it the registry's full address as its name. The command used to add this additional name to an existing image is what?

2. A tag like :v2.3.1 can be reassigned to a different image by the maintainer. To pin a production deployment to an exact, immutable image build that can never change, you reference the image by its what?

3. Docker Hub limits unauthenticated pulls to how many per IP address per 6-hour window?

Quiz

Up Next · Lesson 29