Terraform Course
Terraform with GitOps
GitOps is a philosophy: Git is the single source of truth, and the system continuously reconciles actual state to desired state automatically. Terraform is naturally aligned with this philosophy — configurations live in Git and describe desired state. This lesson covers where Terraform fits in a GitOps workflow, how tools like Atlantis and Spacelift implement GitOps for infrastructure, and where the boundary between Terraform-managed infrastructure and GitOps-managed applications belongs.
This lesson covers
GitOps principles and how Terraform fits → Atlantis — GitOps for Terraform PRs → The atlantis.yaml configuration → Spacelift and HCP Terraform → GitOps for Kubernetes vs GitOps for infrastructure → The reconciliation loop → When GitOps breaks down
GitOps Principles and How Terraform Fits
GitOps has four core principles. Understanding them reveals both where Terraform is a natural fit and where friction appears.
| GitOps Principle | Terraform alignment | Friction points |
|---|---|---|
| Declarative | HCL describes desired state — naturally declarative | Some resources require imperative steps (bootstrap) |
| Versioned and immutable | Config lives in Git — full history | State file is mutable and external to Git |
| Pulled automatically | Tools like Atlantis watch Git for changes | Terraform does not self-apply — needs a trigger |
| Continuously reconciled | Drift detection can trigger reconciliation | Full reconciliation on every run risks unintended changes |
The Analogy
GitOps is like a thermostat for your infrastructure. You set the desired temperature (commit to Git). The thermostat (the GitOps operator) continuously checks the actual temperature (real infrastructure state) and applies heating or cooling (terraform apply) to bring reality in line with your setting. The thermostat never asks you to manually push a button — it acts autonomously. Atlantis is a more conservative thermostat: it shows you the plan and asks permission before turning on the heating.
Atlantis — GitOps for Terraform PRs
Atlantis is the most widely adopted GitOps tool specifically for Terraform. It runs as a server that listens to GitHub, GitLab, or Bitbucket webhook events. When a PR is opened, Atlantis automatically runs terraform plan and posts the output as a PR comment. A reviewer types atlantis apply in the comment to trigger the apply — directly from the PR, with no separate pipeline to navigate.
New terms:
- atlantis.yaml — the configuration file in the repository root that tells Atlantis which directories are Terraform projects, which workspace to use, and which workflows to apply. Without it, Atlantis auto-discovers
*.tffiles. - Atlantis project — a directory containing Terraform configuration that Atlantis manages as a unit. Each project gets its own plan and apply. A repository can contain many projects — one per root module.
- atlantis plan / atlantis apply — PR comments that trigger operations. Anyone with PR access can comment
atlantis planto re-run the plan. Reviewers commentatlantis applyto approve and apply. - apply requirements — conditions that must be met before
atlantis applyis allowed. The most important:approved— the PR must have at least one GitHub approval.mergeable— the PR must pass all required status checks.
# atlantis.yaml — configuration file in repository root
# Tells Atlantis which Terraform projects to manage and how
version: 3
automerge: false # Do not auto-merge PRs after apply — humans control merges
autodiscover:
mode: enabled # Automatically discover projects by finding *.tf files
# Explicit project definitions — override autodiscovery when needed
projects:
- name: networking
dir: infrastructure/foundation/networking
workspace: default
terraform_version: v1.6.3 # Pin Terraform version per project
autoplan:
when_modified:
- "**/*.tf" # Re-plan when any .tf file in the directory changes
- "**/*.tfvars"
enabled: true
- name: payments-dev
dir: infrastructure/services/payments
workspace: dev
terraform_version: v1.6.3
apply_requirements: # Must be met before atlantis apply is allowed
- approved # PR must have at least 1 GitHub approval
autoplan:
when_modified: ["**/*.tf"]
enabled: true
- name: payments-prod
dir: infrastructure/services/payments
workspace: prod
terraform_version: v1.6.3
apply_requirements:
- approved # At least 1 GitHub approval
- mergeable # All required status checks must pass
workflow: production # Use the custom workflow defined below
autoplan:
enabled: false # Production: never auto-plan on PR open
# Custom workflows — override the default init/plan/apply commands
workflows:
production:
plan:
steps:
- env:
name: AWS_ROLE_ARN
command: 'echo "arn:aws:iam::PROD_ACCOUNT:role/terraform-deployment-role"'
- run: aws sts assume-role --role-arn $AWS_ROLE_ARN --role-session-name atlantis-prod
- init
- plan:
extra_args: ["-var-file=environments/prod.tfvars"]
apply:
steps:
- apply
# How Atlantis works in practice — PR lifecycle
# 1. Developer opens PR modifying infrastructure/services/payments/*.tf
# 2. Atlantis webhook fires automatically
# 3. Atlantis runs:
# terraform init
# terraform workspace select dev
# terraform plan -out=atlantis-plan
# 4. Atlantis posts plan output as PR comment:
# ---
# atlantis plan for dir: infrastructure/services/payments workspace: dev
#
# Terraform will perform the following actions:
# ~ aws_ecs_service.payments
# + desired_count: 2 -> 3
# Plan: 0 to add, 1 to change, 0 to destroy.
# ---
# To apply this plan, comment: atlantis apply -d infrastructure/services/payments -w dev
# To re-plan: atlantis plan -d infrastructure/services/payments -w dev
# 5. Reviewer reads the plan in the PR and leaves a GitHub approval
# 6. Reviewer (or the developer) comments: atlantis apply
# 7. Atlantis applies the saved plan
# 8. Atlantis posts apply output as PR comment
# Key insight: the entire workflow happens in the PR — not in a separate CI/CD UI
# The developer and reviewer never leave GitHub to see what is being deployed
# PR #187: "Increase payments service capacity"
# Modified: infrastructure/services/payments/main.tf
# [Atlantis bot] — automatic comment after PR open
atlantis plan for dir: infrastructure/services/payments workspace: dev
Terraform will perform the following actions:
~ aws_ecs_service.payments
~ desired_count: 2 -> 3
Plan: 0 to add, 1 to change, 0 to destroy.
To apply this plan, comment:
atlantis apply -d infrastructure/services/payments -w dev
──────────────────────────────────────────────────
# [alice] approved the PR
# [alice] comments: atlantis apply
# [Atlantis bot] — apply result
Applying plan for dir: infrastructure/services/payments workspace: dev
aws_ecs_service.payments: Modifying...
aws_ecs_service.payments: Modifications complete after 8s
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
PR can now be merged.What just happened?
- The entire workflow happened inside the PR. The developer opened the PR, Atlantis planned automatically, the reviewer read the plan as a PR comment, approved the PR, and commented to apply — all without leaving GitHub. No separate Jenkins URL, no GitHub Actions tab, no additional login. The infrastructure change and the code change are reviewed in one place.
- apply_requirements enforce a review gate without manual configuration. Atlantis refuses to apply until the PR has a GitHub approval. This is enforced at the Atlantis level — not just by convention. A developer cannot comment
atlantis applyand apply their own changes without another human approving first.
HCP Terraform and Spacelift
HCP Terraform (formerly Terraform Cloud) and Spacelift are managed platforms that implement GitOps for Terraform without self-hosting an Atlantis server. They provide the same PR-triggered plan, remote state management, variable storage, and policy enforcement — as a hosted service.
# HCP Terraform backend — state and runs are managed remotely
terraform {
cloud {
organization = "acme-corp" # HCP Terraform organisation
workspaces {
name = "payments-production" # Each workspace maps to a root module + environment
}
}
}
# In HCP Terraform UI or via API:
# - Connect workspace to GitHub repository and directory
# - Configure VCS trigger: run plan on PR, apply on merge
# - Store sensitive variables (AWS credentials, secrets) in workspace variables
# - Set apply method: auto-apply (GitOps) or manual approval (required for prod)
# - Configure run triggers: one workspace triggers another after apply
# HCP Terraform run triggers — chain workspaces in dependency order
# When networking workspace applies → automatically triggers platform workspace plan
# When platform workspace applies → automatically triggers services workspace plan
# This implements the foundation → platform → services dependency order from Lesson 36
# without any custom CI/CD pipeline code
# Spacelift is a similar concept with additional features:
# - Policy as code for run approval (OPA-based)
# - Drift detection with scheduled reconciliation runs
# - Stack dependencies (equivalent to HCP Terraform run triggers)
# - Module registry with private module hosting
# The key difference: Spacelift supports multiple IaC tools — Terraform, Pulumi,
# CloudFormation, Ansible — in a single platform
GitOps for Kubernetes vs GitOps for Infrastructure
ArgoCD and Flux are GitOps tools specifically for Kubernetes — they watch a Git repository and continuously synchronise Kubernetes cluster state to what the repository describes. They are not designed for Terraform. Understanding where each tool belongs prevents the common mistake of trying to use ArgoCD to apply Terraform or trying to use Atlantis to manage Kubernetes manifests.
| Tool | Purpose | What it manages |
|---|---|---|
| ArgoCD / Flux | Kubernetes GitOps — continuously syncs cluster state | K8s Deployments, Services, ConfigMaps, Helm releases |
| Atlantis | Terraform GitOps — PR-triggered plan and apply | Cloud infrastructure — VPCs, databases, IAM, compute |
| HCP Terraform / Spacelift | Managed Terraform platform — remote runs + state | Cloud infrastructure with policy enforcement |
| The Terraform K8s provider | Manage K8s cluster-level resources from Terraform | Namespaces, RBAC, StorageClasses — not Deployments |
# The clean boundary — a complete GitOps + Terraform architecture
# Layer 1: Terraform (via Atlantis) manages cloud infrastructure
# Repository: github.com/acme/infrastructure
# Trigger: PR open → auto-plan, PR approval + "atlantis apply" → apply
# Manages: EKS cluster, VPCs, RDS, S3, IAM, ACM certificates
# Layer 2: Terraform (via Atlantis) manages cluster-level Kubernetes config
# Repository: github.com/acme/infrastructure (same repo, different directory)
# Manages: Namespaces, RBAC, StorageClasses, cluster add-ons via Helm
# Trigger: same Atlantis workflow
# Layer 3: ArgoCD manages application workloads
# Repository: github.com/acme/app-manifests
# ArgoCD watches this repo continuously
# Manages: Deployments, Services, Ingresses, HPA for application workloads
# Trigger: Push to main → ArgoCD detects diff → syncs to cluster automatically
# The boundary is clear:
# "Does this change when application code changes?"
# YES → ArgoCD (image tags, replica counts, environment variables)
# NO → Terraform (VPCs, databases, cluster size, IAM roles)
# A common mistake: using ArgoCD to apply Terraform via a Job or runner pod
# This breaks the boundary — Terraform state becomes unmanageable,
# drift detection stops working, and ArgoCD cannot meaningfully diff Terraform state
# Keep ArgoCD for Kubernetes manifests. Use Atlantis or HCP Terraform for Terraform.
The Reconciliation Loop and When It Breaks Down
True GitOps requires continuous reconciliation — the system detects drift and self-heals automatically. For Kubernetes this works well because Kubernetes has a native reconciliation controller. For Terraform it is more nuanced: fully automatic reconciliation can apply destructive changes without human review.
# Spacelift drift detection — scheduled reconciliation for Terraform
# This implements the "continuously reconciled" GitOps principle for infrastructure
# Spacelift stack configuration (in Spacelift UI or API):
# - drift_detection: enabled
# - drift_detection_schedule: "0 */6 * * *" (every 6 hours)
# - drift_detection_reconcile: false (detect only — do NOT auto-apply)
# When drift is detected:
# 1. Spacelift runs terraform plan against the current state
# 2. If the plan shows changes (drift), Spacelift creates a drift notification
# 3. A human reviews the drift and decides: apply (fix the drift) or ignore
# 4. The notification includes what drifted — a security group rule changed manually,
# an instance type was modified in the console, a tag was removed
# Why "detect but do not auto-reconcile" for infrastructure:
# Auto-reconciliation on a Kubernetes Deployment is safe — it changes a running pod
# Auto-reconciliation on a database parameter change might cause a restart
# Auto-reconciliation removing a security group rule that was added for an incident
# could delete the engineer's emergency access mid-incident
# The safer GitOps model for Terraform:
# DETECT drift automatically
# NOTIFY the team
# HUMAN decides to apply or document the deviation
# Full auto-reconcile is appropriate only for:
# - Dev and sandbox environments where drift does not matter
# - Resources where changes are truly zero-risk (tags, descriptions, labels)
# - After thorough testing proves the reconciliation does not cause disruption
Common GitOps + Terraform Mistakes
Enabling auto-apply in Atlantis on production environments
Atlantis supports autoapply: true in the atlantis.yaml — automatically applying after a plan without any human comment or approval. For development environments this is a reasonable time-saver. For production, it removes the human review gate entirely. A developer who mistypes a resource count or deletes the wrong block will have their change applied to production the moment they push. Always require explicit atlantis apply comments and apply_requirements: [approved] on production projects.
Locking the Atlantis server to a single set of AWS credentials
Atlantis itself needs AWS credentials to run Terraform. If all projects share a single IAM role that has access to every AWS account, a compromised Atlantis server or a malicious PR could target any account. Use separate credentials per project in atlantis.yaml — each project's workflow assumes a different IAM role scoped to the appropriate account and environment.
Treating full auto-reconciliation as equivalent to manual apply on all resource types
Terraform plan output showing 1 to change looks the same whether it is changing a tag or forcing an RDS instance to reboot by modifying a parameter group. Automated reconciliation treats all changes as equal. A carefully reviewed manual apply catches the reboot risk before it happens. Implement drift detection with notification rather than auto-apply for any infrastructure that is not purely cosmetic.
Choosing the right GitOps tool for Terraform
Atlantis: best for teams that want full control, self-host their tools, and already have GitHub/GitLab. Free, open source, integrates naturally with existing Git workflows. Requires a server to run. HCP Terraform: best for teams that want managed state + runs without maintaining infrastructure. Free tier covers small teams. Paid tiers add policy enforcement and audit logs. Spacelift: best for teams that need advanced policy-as-code, multi-tool support, or drift detection with reconciliation. More opinionated but more capable. The choice between them is operational — all implement the same GitOps principles.
Practice Questions
1. After reviewing a Terraform plan posted by Atlantis as a PR comment, what do you type in the PR to trigger the apply?
2. Which atlantis.yaml apply_requirement ensures the PR must have at least one GitHub approval before atlantis apply is allowed?
3. What is the correct boundary between ArgoCD and Atlantis in a Kubernetes + Terraform architecture?
Quiz
1. What file configures Atlantis projects, apply requirements, and custom workflows in a repository?
2. What is the recommended reconciliation strategy for production Terraform infrastructure?
3. How do HCP Terraform run triggers implement the foundation → platform → services dependency order from the scale architecture?
Up Next · Lesson 40
Drift Detection
GitOps patterns established. Lesson 40 goes deep on drift — what it is, how it happens, how to detect it systematically, and how to respond. We cover terraform plan as a drift detector, scheduled drift detection pipelines, and the decision framework for whether to fix drift or accept it as a documented deviation.