Terraform Lesson 41 – Upgrade Strategies | Dataplexa

Section IV · Lesson 41

Upgrade Strategies

Terraform and its providers release updates regularly. Some are patch fixes that apply cleanly. Others are major versions that rename resources, change defaults, and require configuration changes before a single plan will succeed. Upgrading without a strategy causes production incidents. This lesson gives you a systematic approach to upgrading Terraform core, providers, and modules safely — one step at a time.

This lesson covers

The lock file and why it matters → Upgrading Terraform core → Upgrading providers — minor vs major → The AWS provider v4 → v5 migration as a case study → Upgrading modules → The upgrade runbook → required_version constraints best practices

The Lock File — Your Safety Net

The .terraform.lock.hcl file records the exact provider versions and checksums used the last time terraform init ran successfully. It is the mechanism that ensures every engineer and every CI/CD pipeline uses the same provider versions — regardless of what new releases appear on the registry.

New terms:

.terraform.lock.hcl — the dependency lock file. Always commit this to Git. It pins exact provider versions and their checksums. Without it, different engineers may silently use different provider versions, producing different behaviour from the same code.
terraform init -upgrade — re-resolves all provider versions against current version constraints and updates the lock file. Required to adopt a new provider version. Never run without reading the changelog first.
h1: hash — the SHA-256 checksum stored in the lock file for each provider package. When init runs, Terraform verifies the downloaded provider binary matches this hash — preventing silent substitution attacks where a different binary is served with the same version number.

# .terraform.lock.hcl — what it looks like and what each section means

provider "registry.terraform.io/hashicorp/aws" {
  version     = "5.31.0"  # Exact version locked — not a range
  constraints = "~> 5.0"  # The constraint from versions.tf that was satisfied

  hashes = [
    # h1: hash is the primary checksum for the installed provider binary
    "h1:abc123...",
    # zh: hashes are cross-platform checksums — allows the same lock file to be
    # used on macOS (darwin), Linux (linux), and Windows (windows)
    "zh:0a1b2c...",
    "zh:3d4e5f...",
  ]
}

# Rules for the lock file:
# 1. ALWAYS commit to Git — it must be in source control
# 2. NEVER edit manually — let terraform init manage it
# 3. Review lock file changes in PR just like code changes
#    A lock file update that bumps AWS provider 4.x -> 5.x is a major change
#    It should be reviewed with the same care as a code change

# Check what is currently locked
cat .terraform.lock.hcl

# Verify providers match the lock file (check for tampering)
terraform providers lock

Upgrading Terraform Core

Terraform core releases follow semantic versioning. Patch releases (1.6.2 → 1.6.3) are bug fixes and apply cleanly. Minor releases (1.5 → 1.6) add features and are backward compatible. Major releases (1.x → 2.x) have not happened since 1.0 was released in 2021 — HashiCorp committed to a long-term stability guarantee for the 1.x series.

# Step 1: Check your current required_version constraint
# In versions.tf:
terraform {
  required_version = ">= 1.5.0"  # Allows any 1.x at or above 1.5.0
  # Alternatively: "~> 1.5" — allows 1.5.x and 1.6.x but not 2.0
}

# Step 2: Read the Terraform release notes
# https://github.com/hashicorp/terraform/releases
# Look for: BREAKING CHANGES section — usually empty for minor/patch releases
# Look for: Provider compatibility notes — some providers require minimum Terraform versions

# Step 3: Install the new Terraform version
# Using tfenv (Terraform version manager — recommended)
tfenv install 1.6.3       # Install specific version
tfenv use 1.6.3           # Switch to it
terraform version         # Verify: Terraform v1.6.3

# Step 4: Run init and validate — no infrastructure changes
terraform init            # Re-download providers if needed for the new Terraform version
terraform validate        # Check for syntax or compatibility issues

# Step 5: Run plan — look for unexpected diffs
terraform plan            # Should match what you expect — no surprise changes from the upgrade

# Step 6: Update required_version to document the minimum
# If upgrading to 1.6.x:
terraform {
  required_version = ">= 1.6.0"  # Update to reflect the new minimum
}

# Step 7: Update lock file and commit
git add .terraform.lock.hcl versions.tf
git commit -m "upgrade: Terraform core 1.5.x -> 1.6.3"

# Tip: Use tfenv in CI/CD to match the version pinned in versions.tf
# .terraform-version file in the repo root — tfenv reads it automatically
echo "1.6.3" > .terraform-version

Upgrading Providers — Minor vs Major

Provider minor version upgrades add support for new resources and fix bugs — generally safe. Provider major version upgrades are infrastructure events. They frequently rename arguments, split resources into sub-resources, change defaults, and trigger resource replacements on existing infrastructure. Read the changelog. Always.

# Minor version upgrade — safe, apply in a single step

# versions.tf — update constraint to allow the new minor version
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"  # Was "~> 5.0" — constraint already allows 5.31 -> 5.40
    }
  }
}

# Update the lock file to the latest allowed version
terraform init -upgrade  # Upgrades within the constraint (~> 5.0 → latest 5.x)
terraform plan           # Verify: no unexpected changes

git add .terraform.lock.hcl
git commit -m "upgrade: AWS provider 5.10 -> 5.31"

# ─────────────────────────────────────────────────────────────────────────────

# Major version upgrade — must be done carefully

# Step 1: Read the CHANGELOG — find all BREAKING CHANGES
# https://github.com/hashicorp/terraform-provider-aws/releases
# Search for: "BREAKING CHANGE", "removed", "renamed", "deprecated"

# Step 2: Check your code against the migration guide
# HashiCorp publishes upgrade guides for major provider versions:
# https://registry.terraform.io/providers/hashicorp/aws/latest/docs/guides/version-4-upgrade
# https://registry.terraform.io/providers/hashicorp/aws/latest/docs/guides/version-5-upgrade

# Step 3: Update the constraint to allow the major version
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"  # Was "~> 4.0" — change the major version bound
    }
  }
}

# Step 4: Run init -upgrade to download the new major version
terraform init -upgrade  # Downloads 5.x — different code from 4.x

# Step 5: Run validate — many breaking changes are caught here
terraform validate       # Likely shows errors for removed/renamed arguments

# Step 6: Fix all validation errors BEFORE running plan
# Removed arguments: delete or replace with the new resource
# Renamed arguments: update to new names
# Changed defaults: explicitly set the old default to maintain current behaviour

# Step 7: Run plan and read EVERY line
terraform plan           # Some resources may show -/+ (destroy and recreate!)
# A -/+ on a database means downtime — address it before applying

AWS Provider v4 → v5 Migration Case Study

The AWS provider v4 to v5 upgrade was one of the largest breaking changes in Terraform history. It is an excellent case study of how to handle a major provider migration — and the exact mistakes teams made when they did not read the changelog first.

# AWS Provider v4 -> v5: The major breaking changes

# BREAKING CHANGE 1: S3 bucket attributes split into sub-resources (v4 change, still relevant)
# Before (provider v3):
resource "aws_s3_bucket" "app" {
  bucket = "my-app-bucket"
  acl    = "private"                    # REMOVED in v4
  versioning { enabled = true }         # REMOVED in v4
  server_side_encryption_configuration { # REMOVED in v4
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm = "AES256"
      }
    }
  }
}

# After (provider v4+):
resource "aws_s3_bucket" "app" {
  bucket = "my-app-bucket"
  # No acl, no versioning, no encryption — all moved to sub-resources
}

resource "aws_s3_bucket_acl" "app" {
  bucket = aws_s3_bucket.app.id
  acl    = "private"
}

resource "aws_s3_bucket_versioning" "app" {
  bucket = aws_s3_bucket.app.id
  versioning_configuration { status = "Enabled" }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "app" {
  bucket = aws_s3_bucket.app.id
  rule {
    apply_server_side_encryption_by_default { sse_algorithm = "AES256" }
  }
}

# BREAKING CHANGE 2: Default tags behaviour changed in v5
# v4: default_tags merged with resource tags using resource tags taking precedence
# v5: conflict between default_tags and resource tags causes an error
# Fix: Remove any tag keys from resource blocks that also appear in default_tags

# BREAKING CHANGE 3: EC2 instance metadata options changed defaults
# v5: metadata_options.http_tokens defaults to "required" (IMDSv2 only)
# This means existing EC2 instances that use IMDSv1 will show a drift diff
# Fix: Explicitly set http_tokens = "optional" to maintain old behaviour
# OR migrate your applications to use IMDSv2

resource "aws_instance" "app" {
  # ... other config ...

  metadata_options {
    http_tokens                 = "required"   # Recommended: require IMDSv2
    http_put_response_hop_limit = 1
    instance_metadata_tags      = "disabled"
  }
}

# MIGRATION APPROACH: The two-phase upgrade
# Phase 1: Update code to be compatible with BOTH v4 and v5 simultaneously
#   - Add sub-resources for S3 (they work in v4 too)
#   - Explicitly set all previously-defaulted options
#   - Apply this code change against v4 — verify nothing breaks
# Phase 2: Upgrade the provider constraint to v5
#   - terraform init -upgrade
#   - terraform validate (should pass after Phase 1 fixes)
#   - terraform plan (should show minimal or no changes)
#   - Apply

What just happened?

The two-phase approach is the key to zero-downtime major upgrades. Phase 1 makes the code compatible with both old and new provider versions — so if Phase 2 reveals a problem, you can simply revert the constraint and everything still works on the old provider. Teams that skip Phase 1 and go straight to the constraint change get stuck: the old code doesn't compile against the new provider, and they can't easily roll back.
Changed defaults are the most insidious breaking changes. A breaking change that adds a new required argument is caught by terraform validate. A default that changes from optional to required — like http_tokens — passes validation but shows a diff on every existing EC2 instance. Read the default changes section of every upgrade guide as carefully as the removed arguments section.

required_version Constraints Best Practices

# required_version in the root configuration
terraform {
  # Best practice: lower bound only — allows any version at or above the minimum
  # This lets the team upgrade Terraform without changing the constraint
  required_version = ">= 1.5.0"

  # Alternative: tilde operator for tighter control
  # ~> 1.5 allows 1.5.x and 1.6.x but NOT 2.0
  required_version = "~> 1.5"

  # Avoid: exact pin in the root configuration
  required_version = "= 1.5.4"
  # This forces every engineer and every CI pipeline to use exactly 1.5.4
  # Every patch release requires a configuration change — unnecessary overhead

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      # In root modules: use ~> for provider constraints
      # ~> 5.0 allows 5.x but not 6.0 — protects from unintended major upgrades
      version = "~> 5.0"
    }
  }
}

# required_version in reusable modules
# Modules should be MORE permissive — not less — to avoid forcing callers to upgrade
terraform {
  # Lower bound only — module works with any sufficiently recent Terraform
  required_version = ">= 1.3.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      # In modules: >= not ~> — the caller's root module controls the upper bound
      version = ">= 4.0, < 6.0"  # Works with v4 and v5 — caller chooses
      # Avoid: version = "~> 5.0" in a module — forces all callers to use 5.x
    }
  }
}

# The tfenv .terraform-version file — pin the exact version for CI/CD
# Place in repository root — tfenv reads it automatically
# All engineers and pipelines use the same Terraform version
# To upgrade: update this file, update required_version, run tests, commit
cat .terraform-version
# 1.6.3

The Upgrade Runbook

Every upgrade — whether Terraform core or a provider — follows the same runbook. Deviating from it is how production incidents happen.

# THE TERRAFORM UPGRADE RUNBOOK
# Follow in this exact order — no skipping steps

# ── Step 1: Read the changelog BEFORE touching any code ──────────────────────
# Terraform core:   github.com/hashicorp/terraform/releases
# AWS provider:     github.com/hashicorp/terraform-provider-aws/releases
# Azure provider:   github.com/hashicorp/terraform-provider-azurerm/releases
# GCP provider:     github.com/hashicorp/terraform-provider-google/releases
# Look for: BREAKING CHANGES, deprecated arguments, changed defaults, removed resources

# ── Step 2: Create a dedicated upgrade branch ─────────────────────────────────
git checkout -b upgrade/aws-provider-5x

# ── Step 3: Update the version constraint ────────────────────────────────────
# versions.tf: change "~> 4.0" to "~> 5.0"

# ── Step 4: Download the new version ─────────────────────────────────────────
terraform init -upgrade  # -upgrade re-resolves against new constraints

# ── Step 5: Fix all validation errors ────────────────────────────────────────
terraform validate
# Fix every error before proceeding — do not move to plan with validation errors

# ── Step 6: Run plan against DEV environment ──────────────────────────────────
terraform plan -var="environment=dev" | tee upgrade-plan-dev.txt
# Critically examine:
# -/+ entries: resources being destroyed and recreated (potential downtime)
# ~ entries:   in-place modifications (usually safe, but read them)
# + entries:   new resources being created (fine)
# Any -/+ on databases, load balancers, or critical resources = STOP, investigate

# ── Step 7: Apply to DEV, verify, monitor ─────────────────────────────────────
terraform apply -var="environment=dev"
# Wait 30 minutes — watch for alarms, logs, errors
# Smoke test the application — key user journeys still work

# ── Step 8: Apply to STAGING, verify, monitor ─────────────────────────────────
terraform apply -var="environment=staging"
# Wait 24 hours minimum for staging — let any delayed failures surface

# ── Step 9: Open PR for production ────────────────────────────────────────────
# PR includes: changelog link, plan output for prod, test results from dev/staging
# PR requires review from a second engineer — peer review of infrastructure change

# ── Step 10: Apply to PRODUCTION with approval ────────────────────────────────
# Only after PR approval — apply during a low-traffic maintenance window
# Have a rollback plan: lock file from before the upgrade is in Git history
# Monitor for 30 minutes after apply completes

# ── Step 11: Rollback procedure ───────────────────────────────────────────────
# If something goes wrong after the provider upgrade:
git show HEAD~1:.terraform.lock.hcl > .terraform.lock.hcl  # Restore old lock file
# Revert the versions.tf constraint change
terraform init    # Re-downloads old provider version matching the restored lock file
terraform plan    # Verify the old provider works — should show no changes or the reverse
# Note: you cannot always roll back provider changes that modified resource attributes

Common Upgrade Mistakes

Not committing .terraform.lock.hcl to Git

Without the lock file in Git, different engineers run different provider versions. Engineer A on AWS provider 5.10 creates a resource that uses a new argument. Engineer B on 5.5 plans the same code and gets an "unsupported argument" error — not because the code is wrong, but because their provider is older. CI/CD may silently use a different version than local development. The lock file is not a generated artifact to ignore — it is a first-class configuration file that must be in source control.

Running terraform init -upgrade without reading the changelog

terraform init -upgrade downloads the newest version within your constraint. If your constraint is ~> 5.0 and AWS provider 5.40 was just released with a deprecation that shows a warning on every plan — or a changed default that causes every EC2 instance to show a diff — you would have no warning. Read the changelog for every version between your current and your target before running the upgrade command.

Skipping staging and applying a major upgrade directly to production

Provider major version upgrades frequently trigger resource replacements on existing infrastructure that did not appear during initial testing on empty environments. Staging must have the same resource types as production, at similar scale, for the upgrade to surface these replacement plans. A team that only tests on dev — which has a single small EC2 instance — misses the RDS parameter group change that forces a database reboot in production.

Keep upgrades small and frequent

The most dangerous upgrade is the one that jumps multiple major versions at once because the team neglected upgrades for two years. AWS provider 3.x → 5.x in one step has caused multi-day migrations with dozens of resource replacements to manage. The discipline of upgrading one version at a time, on a quarterly schedule, keeps each individual upgrade small and low-risk. Treat provider upgrades the same way you treat library dependency upgrades in application code — routinely, not desperately.

Practice Questions

1. Which command re-resolves provider versions against current constraints and updates the lock file to adopt a newer provider version?

2. What version constraint operator should modules use for required_providers, and why does it differ from root module best practice?

3. What is the two-phase approach to major provider upgrades and why does it enable safe rollback?

Quiz

Up Next · Lesson 42

Terraform Testing

Upgrades safe. Lesson 42 covers testing Terraform configurations — the native terraform test framework introduced in 1.6, Terratest for Go-based integration tests, unit testing with mocks, and the testing pyramid for infrastructure code. Learn how to build confidence in Terraform changes before they reach production.

← Previous Course Index Next →

Terraform Course