Kubernetes Lesson 14 – Kubernetes YAML Basics | Dataplexa
Kubernetes Fundamentals · Lesson 14

Kubernetes YAML Basics

Every Kubernetes object you'll ever create starts with a YAML file — it's the universal language of the cluster, and once you understand its structure, everything else clicks into place.

Why Kubernetes Uses YAML

Before we write a single manifest, let's talk about why Kubernetes settled on YAML. You could have a dashboard, a GUI, a drag-and-drop interface — and Kubernetes does have dashboards. But YAML won out because it has something those tools don't: it's a file. You can commit it to Git. You can code-review it. You can diff it, roll it back, replicate it across environments, and version it like any other piece of software.

This is the idea behind Infrastructure as Code — your infrastructure isn't a set of manual steps someone performed last Tuesday. It's a file in a repo that anyone on the team can read, run, and reproduce. YAML is just the format Kubernetes chose to express that idea.

🌱 The Git commit is the source of truth. A healthy Kubernetes team doesn't SSH into clusters and run manual commands in production. They update YAML files, open a pull request, get a review, and merge. The cluster reflects the repo. If the cluster ever drifts from the repo, the repo wins.

YAML: The Basics You Actually Need

You don't need to be a YAML expert — you need to understand about five concepts and you're set. Let's go through each one fast.

1. Key-value pairs. YAML is mostly key: value. Simple as that. The key and value are separated by a colon and a space.

2. Indentation = structure. YAML uses spaces (never tabs) to show nesting. Two spaces in means "this is a child of the item above." Get this wrong and Kubernetes will throw a parse error before it even looks at your manifest.

3. Lists start with a dash. A - means "this is an item in a list." You'll see this constantly for containers, ports, volumes, and environment variables.

4. Strings don't need quotes (usually). name: payment-service is fine unquoted. But if your string contains a colon, special characters, or starts with a number, wrap it in quotes.

5. Comments start with #. Kubernetes ignores them completely. Use them liberally — your future self at 2am will thank you.

The Four Fields Every Kubernetes Manifest Has

No matter what Kubernetes object you're creating — a Pod, a Deployment, a Service, a ConfigMap — the manifest always starts with the same four top-level fields. Every single one. Learn these four and you'll be able to read any manifest you encounter.

Field What it means Example
apiVersion Which Kubernetes API group and version handles this object apps/v1, v1, networking.k8s.io/v1
kind The type of Kubernetes object you're creating Pod, Deployment, Service, ConfigMap
metadata Data that identifies the object: name, namespace, labels name: payment-service, labels: ...
spec The desired state — what you actually want this object to do or be containers, replicas, ports, volumes...

The Anatomy of a Kubernetes Manifest

Let's look at the most fundamental object in Kubernetes — a Pod — and break down every line of its manifest together.

The scenario: You're a junior DevOps engineer who just joined an e-commerce company. Your tech lead asks you to write your first Pod manifest for a new product catalog service. You need to run a single container, expose port 8080, and label it so the team can find it later. This is that manifest — and we're going to read every line together.

apiVersion: v1            # v1 is the core API group — used for Pods, Services, ConfigMaps
kind: Pod                  # We're creating a Pod — the smallest deployable unit in Kubernetes
metadata:                  # Metadata block: describes the object itself (not what it does)
  name: catalog-pod        # Every object needs a unique name within its namespace
  namespace: default       # Which namespace this Pod lives in (default if not specified)
  labels:                  # Labels: key-value tags used to identify and group objects
    app: catalog           # app=catalog — used by Services and Deployments to find this Pod
    tier: backend          # tier=backend — useful for filtering with kubectl get pods -l tier=backend
    version: "1.0"         # version label — quoted because it could be mistaken for a number
spec:                      # Spec block: the desired state — what we want this Pod to contain
  containers:              # containers is a list (note the dash below) — a Pod can have multiple
    - name: catalog-app    # Name of this specific container within the Pod
      image: nginx:1.25    # Docker image to run — always pin a version tag, never use :latest in prod
      ports:               # List of ports this container exposes
        - containerPort: 8080  # The port the app inside the container listens on
      resources:           # Resource block: tells Kubernetes how much CPU/memory this container needs
        requests:          # requests: the minimum guaranteed allocation
          cpu: "100m"      # 100 millicores = 0.1 CPU — Kubernetes schedules based on this
          memory: "128Mi"  # 128 mebibytes of memory minimum
        limits:            # limits: the maximum the container can consume before being throttled/OOM-killed
          cpu: "250m"      # 250 millicores = 0.25 CPU ceiling
          memory: "256Mi"  # 256Mi memory ceiling — if exceeded, container is killed and restarted
$ kubectl apply -f catalog-pod.yaml
pod/catalog-pod created

$ kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
catalog-pod    1/1     Running   0          12s

What just happened?

apiVersion: v1 — Kubernetes has dozens of API groups. v1 is the original "core" group containing the most fundamental objects: Pods, Services, ConfigMaps, Secrets, Namespaces. When you see apps/v1 (which you will in the next lesson on Deployments), that means the object lives in the apps API group version 1.

metadata.name — This is the unique identifier within the namespace. If you try to apply a manifest with a name that already exists, Kubernetes will update the existing object rather than create a new one. That's the "declarative" model — you declare state, Kubernetes reconciles.

metadata.labels — Labels are not just decorative. They're how Kubernetes objects find each other. Services use label selectors to route traffic to Pods. Deployments use them to manage which Pods they own. Without labels, your objects are islands.

spec.containers — The containers field is a list, which is why the first item starts with -. A Pod can technically run multiple containers (called a sidecar pattern), but the most common case is one container per Pod.

image: nginx:1.25 — Always pin your image tag. :latest means Kubernetes will pull whatever the registry considers "latest" at pull time — this has caused production outages when a surprise breaking change was tagged latest by the upstream maintainer.

The outputREADY 1/1 means 1 out of 1 containers in this Pod is ready. STATUS: Running means the container process started successfully. RESTARTS: 0 means nothing has crashed yet — a clean start.

The apiVersion Cheat Sheet

One of the most common rookie mistakes is putting the wrong apiVersion. Here's the map you'll use constantly:

Object apiVersion Why this group?
Podv1Core Kubernetes object, original API
Deploymentapps/v1Workload management lives in apps group
ReplicaSetapps/v1Workload management, same group as Deployment
StatefulSetapps/v1Stateful workloads, same group
Servicev1Core networking object
ConfigMapv1Core configuration object
Secretv1Core secrets object
Ingressnetworking.k8s.io/v1Networking extension group
HorizontalPodAutoscalerautoscaling/v2Autoscaling extension group

Multi-Resource YAML: The --- Separator

Production YAML files rarely define just one object. You can put multiple Kubernetes resources in a single file using --- as a separator. This is extremely common — you'll see teams keep a Deployment and its Service in one file so they travel together.

The scenario: You're a DevOps engineer at a SaaS company. Your team just built a new auth microservice and needs to deploy it to the cluster. The backend developer hands you a container image and says "make it accessible internally on port 3000." You write a single YAML file that creates both the Pod and the Service to expose it — because they're related and should be deployed together.

apiVersion: v1              # First document: a Pod for the auth service
kind: Pod
metadata:
  name: auth-pod            # Name this Pod auth-pod
  labels:
    app: auth               # Label app=auth — the Service below will use this to find the Pod
spec:
  containers:
    - name: auth-container
      image: company/auth-service:2.1.0   # Always use a versioned tag from your internal registry
      ports:
        - containerPort: 3000             # App listens on 3000 inside the container

---                         # Three dashes = document separator — starts the next Kubernetes object

apiVersion: v1              # Second document: a Service to expose the Pod
kind: Service
metadata:
  name: auth-service        # Name this Service auth-service
spec:
  selector:
    app: auth               # Match Pods with label app=auth — this is how Service finds our Pod above
  ports:
    - port: 3000            # Port the Service exposes to the rest of the cluster
      targetPort: 3000      # Port on the Pod to forward traffic to (matches containerPort above)
  type: ClusterIP           # ClusterIP = internal-only, not reachable from outside the cluster
$ kubectl apply -f auth-service.yaml
pod/auth-pod created
service/auth-service created

$ kubectl get pods,services
NAME           READY   STATUS    RESTARTS   AGE
pod/auth-pod   1/1     Running   0          8s

NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/auth-service   ClusterIP   10.96.137.201   <none>        3000/TCP   8s

What just happened?

The --- separator — YAML supports multiple "documents" in one file, separated by ---. When you run kubectl apply on this file, Kubernetes reads both documents and creates both objects in order. One command, two objects.

Service selector — The selector: app: auth on the Service is the link between the Service and the Pod. Kubernetes continuously watches for Pods with matching labels and routes traffic to them. This is the entire routing model — labels are the glue.

kubectl get pods,services — You can query multiple resource types in one command by comma-separating them. Useful for getting a full picture of a component's health fast.

CLUSTER-IP: 10.96.137.201 — Kubernetes assigned the Service a stable virtual IP address. Any other pod in the cluster can reach the auth service at this IP on port 3000 — or by using the DNS name auth-service.default.svc.cluster.local. We'll cover DNS in detail in Lesson 35.

Validating and Dry-Running Your YAML

Before you apply a manifest to a production cluster, two kubectl commands will save you from embarrassing yourself in front of your team.

The scenario: You're an SRE preparing a deploy for a critical payment processing service. Your company's change management policy requires you to validate all manifests before they hit production. You've written the YAML, but you want to catch any errors before the change window tonight — without actually touching the cluster.

kubectl apply -f payment-pod.yaml --dry-run=client
# --dry-run=client: simulates the apply entirely on your machine — never contacts the cluster
# Useful for catching YAML syntax errors and obvious misconfigurations before deploy

kubectl apply -f payment-pod.yaml --dry-run=server
# --dry-run=server: sends the manifest to the API server but tells it "don't actually create this"
# The server validates against full admission webhooks and schema — catches more than client mode

kubectl apply -f payment-pod.yaml --dry-run=server -o yaml
# -o yaml: prints the full manifest as Kubernetes would see it, including default values injected
# This is how you see what fields Kubernetes adds automatically (like imagePullPolicy: IfNotPresent)

kubectl diff -f payment-pod.yaml
# diff: shows what WOULD change if you applied this file to what's currently running
# Indispensable before updates — lets you see the diff before committing to a change
$ kubectl apply -f payment-pod.yaml --dry-run=client
pod/payment-pod configured (dry run)

$ kubectl apply -f payment-pod.yaml --dry-run=server
pod/payment-pod configured (dry run)

$ kubectl diff -f payment-pod.yaml
diff -u -N /tmp/LIVE-123456/v1.Pod..default.payment-pod /tmp/MERGED-123456/v1.Pod..default.payment-pod
--- /tmp/LIVE-123456/v1.Pod..default.payment-pod
+++ /tmp/MERGED-123456/v1.Pod..default.payment-pod
@@ -7,7 +7,7 @@
   containers:
   - image: company/payment:1.4.1
-    resources:
-      limits:
-        cpu: 250m
+    resources:
+      limits:
+        cpu: 500m

What just happened?

--dry-run=client vs --dry-run=serverclient mode validates locally: catches typos, missing required fields, bad indentation. server mode sends the manifest to the real API server, running it through the full validation pipeline including admission controllers — it will catch things client mode misses, like referencing a StorageClass that doesn't exist.

kubectl diff output — Lines starting with - are what's live in the cluster right now. Lines starting with + are what your YAML file would change it to. In this case, you can see a CPU limit is being increased from 250m to 500m. Reviewing this before every production apply is just good practice.

The YAML Structure at a Glance

Here's how all the pieces of a Kubernetes manifest relate to each other:

KUBERNETES YAML STRUCTURE
apiVersion: apps/v1
→ Which API group handles this object type
kind: Deployment
→ The type of Kubernetes object
metadata:
name:    → unique identifier
namespace: → which namespace
labels:   → key/value tags for selection
annotations: → non-identifying metadata
spec:
replicas: → how many Pods
selector: → which Pods this owns
template: → Pod blueprint
metadata: → Pod labels
spec: → containers, volumes...
status: (read-only — set by Kubernetes)
→ Current observed state — never write this manually

Teacher's Note: spec vs status — the most important thing in this lesson

The entire Kubernetes reconciliation model comes down to two fields: spec (what you want) and status (what currently exists). You write spec. Kubernetes writes status. The control plane loops forever comparing the two, and when they don't match, it does work to close the gap. This loop is called the reconciliation loop and it's why Kubernetes is self-healing. You never touch status in your YAML — it's automatically populated by Kubernetes and you query it with kubectl describe.

Practice Questions

1. In a Kubernetes manifest, which top-level field contains the desired state of the object — things like how many replicas, which containers to run, and which ports to expose?



2. What three-character separator do you use in a YAML file to define multiple Kubernetes objects in a single file?



3. What is the correct apiVersion for a Kubernetes Deployment manifest?



Quiz

1. What does kubectl apply -f manifest.yaml --dry-run=client do?


2. A Kubernetes Service uses which metadata field on a Pod to determine which Pods to route traffic to?


3. Which top-level field in a Kubernetes manifest is automatically written by Kubernetes to reflect the current observed state, and should never be manually specified in your YAML files?


Up Next · Lesson 15

First Kubernetes Deployment

Everything clicks — we deploy a real multi-replica application end to end, from manifest to live traffic.