Terraform Lesson 25 – Core Best Practices | Dataplexa

Section II · Lesson 25

Core Best Practices

Knowing the mechanics is not enough. The engineers who manage Terraform at scale — hundreds of resources, multiple teams, production systems that cannot go down — follow a set of habits that prevent the incidents everyone else has learned from the hard way. This lesson distils those habits into a reference you can apply immediately.

This lesson covers

Project file structure → Naming conventions → Tagging strategy → Version pinning → The locals pattern → Pre-apply checklist → What belongs in Git and what does not

1 — Project File Structure

Every Terraform project should follow a consistent file structure. Consistency is what lets a new engineer open any project and immediately understand where to look for what. The structure below is the standard that the community has converged on — not an arbitrary choice.

# Standard project structure — every file has one job
my-project/
├── versions.tf        # terraform block + required_providers + backend — always first thing you look at
├── variables.tf       # All input variable declarations — types, descriptions, validation, defaults
├── locals.tf          # Derived values, name prefixes, common tags — computed from variables
├── data.tf            # All data sources — what this config reads but does not own
├── main.tf            # The actual resources — what this config creates
├── outputs.tf         # What this config exposes to the outside world
├── .gitignore         # Excludes state files, plan files, and .terraform/ directory
└── terraform.tfvars   # Non-secret variable overrides for local development — gitignored in CI

# For larger projects — split main.tf into purpose-named files
my-large-project/
├── versions.tf
├── variables.tf
├── locals.tf
├── data.tf
├── networking.tf      # VPC, subnets, route tables, gateways
├── security.tf        # Security groups, IAM roles, policies
├── compute.tf         # EC2, ECS, Lambda, EKS
├── storage.tf         # S3, RDS, DynamoDB
├── outputs.tf
└── .gitignore

Every project needs this .gitignore:

# .gitignore — every Terraform project needs exactly this

# Terraform state — contains secrets, never commit
terraform.tfstate
terraform.tfstate.*
terraform.tfstate.d/

# Terraform plan files — binary files containing resource data
*.tfplan
tfplan

# Terraform working directory — downloaded providers and modules
.terraform/
.terraform.lock.hcl    # Commit this — it pins provider versions for reproducible builds
                       # Exception: commit .terraform.lock.hcl — do NOT gitignore it

# Override files — for local experimentation only
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Variable files that may contain secrets
*.auto.tfvars          # Auto-loaded — may contain credentials
# terraform.tfvars     # Decide per project — often contains non-secret dev overrides

# macOS / editor artifacts
.DS_Store
*.swp
.idea/

Commit .terraform.lock.hcl — do not gitignore it

The lock file records the exact provider version hashes that were used on the last terraform init. Committing it ensures every engineer and every CI/CD pipeline uses the exact same provider version. Without it, a provider minor version release between your init and a teammate's init can produce different plan outputs for the same configuration — a subtle source of drift and confusion.

2 — Naming Conventions

Consistent names make grep, AWS Console searches, cost allocation reports, and debugging dramatically faster. Adopt one convention for the entire organisation and enforce it through code review.

# Naming convention rules — apply these everywhere

# Rule 1: All Terraform identifiers use lower_snake_case
resource "aws_s3_bucket" "app_data" { }       # correct
resource "aws_s3_bucket" "AppData" { }         # wrong — mixed case
resource "aws_s3_bucket" "app-data" { }        # wrong — hyphens not valid in identifiers

# Rule 2: All AWS resource Names follow the pattern: project-component-environment
# Format: {project}-{component}-{environment}
locals {
  name_prefix = "${var.project}-${var.component}-${var.environment}"
  # Results in: "acme-payments-prod", "acme-web-dev", "acme-db-staging"
}

resource "aws_instance" "web" {
  # ...
  tags = {
    Name = "${local.name_prefix}-web"  # acme-payments-prod-web
  }
}

# Rule 3: Use hyphens in AWS resource names, underscores in Terraform identifiers
# Terraform:  resource "aws_security_group" "web_server"   (underscores)
# AWS Name:   "acme-payments-prod-web-server"              (hyphens)

# Rule 4: Resource type is implied by context — do not repeat it
resource "aws_iam_role" "web" {         # NOT "web_role" — it is already a role
  name = "${local.name_prefix}-web"    # NOT "acme-payments-prod-web-role" — redundant
}

# Rule 5: Module sources use consistent versioning
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"  # Always pin with ~> — never use unversioned sources
}

3 — Tagging Strategy

Tags are the metadata layer that makes a cloud account navigable — for cost allocation, security audits, compliance, and incident response. A tag strategy set up correctly from day one pays dividends for years. One applied inconsistently costs double because you have to fix it later.

# The mandatory tag set — every resource in the organisation carries these
# Define once in locals, apply everywhere with merge()

locals {
  # Mandatory tags — non-negotiable on every resource
  mandatory_tags = {
    Project     = var.project      # Which project owns this resource
    Environment = var.environment  # dev / staging / prod
    ManagedBy   = "Terraform"     # Signals this resource must not be edited manually
    Team        = var.team         # Which team is responsible for this resource
    CostCenter  = var.cost_center  # For billing allocation — finance team requirement
  }

  # Optional tags that specific resource types should include
  compute_tags = merge(local.mandatory_tags, {
    PatchGroup  = "${var.environment}-linux"  # For AWS Systems Manager Patch Manager
    Backup      = var.environment == "prod" ? "daily" : "none"  # Backup policy
  })

  data_tags = merge(local.mandatory_tags, {
    DataClass   = "confidential"  # For security scanning and compliance
    Encryption  = "required"      # Signals encryption policy
  })
}

# Apply mandatory_tags to every resource using merge()
resource "aws_instance" "web" {
  # ...
  tags = merge(local.mandatory_tags, {
    Name      = "${local.name_prefix}-web"  # Resource-specific tag added here
    Component = "web-server"                # Additional context for this resource type
  })
}

resource "aws_s3_bucket" "data" {
  # ...
  tags = merge(local.data_tags, {
    Name      = "${local.name_prefix}-data"
    Component = "object-storage"
  })
}

# Use default_tags in the provider block to apply tags to every resource automatically
# This is more reliable than per-resource merge() — it cannot be forgotten
provider "aws" {
  region = var.region

  default_tags {
    tags = {
      Project     = var.project
      Environment = var.environment
      ManagedBy   = "Terraform"
      Team        = var.team
    }
    # Tags set here appear on every resource — no merge() needed for these
    # Resource-specific tags like Name still need to be set on each resource
  }
}

default_tags vs merge(local.mandatory_tags, ...)

Both patterns work. default_tags in the provider block is more reliable — it applies to every resource that the AWS provider manages, even ones you forget to tag manually. The downside: it does not appear in the plan output per resource, making it less visible. merge() in each resource block is more explicit and shows in plan output. Use default_tags for the truly universal tags (ManagedBy, Environment) and merge() for resource-specific additions like Name and Component.

4 — Version Pinning

Unpinned versions are a time bomb. A provider minor version release can change plan output, introduce new required arguments, or deprecate existing ones. An unpinned Terraform version means the same configuration produces different plans on different machines. Always pin everything.

# versions.tf — the complete version pinning reference

terraform {
  # Pin the Terraform CLI version with >= and a pessimistic constraint
  # >= 1.5.0 means any version >= 1.5.0 is acceptable
  # Add an upper bound if you need to prevent major version upgrades
  required_version = ">= 1.5.0, < 2.0.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      # ~> 5.0 means: >= 5.0, < 6.0 — allows patch and minor updates within major 5
      # This is the recommended constraint for providers
      version = "~> 5.0"
    }

    # For providers where minor versions matter more
    # Use a tighter constraint: ~> 5.31 means >= 5.31, < 5.32 (patch updates only)
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.23"  # Kubernetes API changes frequently — tighter constraint
    }

    random = {
      source  = "hashicorp/random"
      version = "~> 3.5"
    }
  }
}

# The ~> operator is the pessimistic constraint operator
# ~> X.Y   allows X.Y.Z updates (patch only) — most restrictive
# ~> X.Y.0 is identical to ~> X.Y — patch updates only
# In practice: use ~> MAJOR.MINOR for providers (allows patch updates)
# Never use: version = "*" or no version constraint at all

# Upgrading providers safely — the correct upgrade workflow

# 1. Review the provider changelog for breaking changes
#    https://registry.terraform.io/providers/hashicorp/aws/latest

# 2. Update the version constraint in versions.tf
#    ~> 5.0  →  ~> 5.31  (to get a specific version or later patch)

# 3. Run upgrade and review the lock file diff
terraform init -upgrade

# 4. Run plan and review for unexpected changes from the provider update
terraform plan

# 5. Test in dev before applying to staging or prod
# 6. Commit the updated .terraform.lock.hcl with the new provider hashes

5 — The Locals Pattern

Locals are computed values derived from variables and data sources. Use them to centralise logic that would otherwise be repeated across resource blocks. The rule: if an expression appears in more than one place, it belongs in a local.

# locals.tf — the pattern in full

locals {
  # ── NAMING ─────────────────────────────────────────────────────────────────

  # One name prefix — used everywhere
  name_prefix = "${var.project}-${var.environment}"

  # Region shortcode — some naming schemes use abbreviated region names
  region_short = {
    "us-east-1"      = "use1"
    "us-west-2"      = "usw2"
    "eu-west-1"      = "euw1"
    "ap-southeast-1" = "apse1"
  }[var.region]  # Map lookup — returns the shortcode for the current region

  # ── SIZING ─────────────────────────────────────────────────────────────────

  # Environment-specific sizing — one map, not scattered ternaries everywhere
  sizing = {
    dev     = { instance_type = "t2.micro",  min_size = 1, max_size = 2  }
    staging = { instance_type = "t3.small",  min_size = 1, max_size = 4  }
    prod    = { instance_type = "t3.medium", min_size = 2, max_size = 10 }
  }

  # Access the current environment's sizing config
  # try() protects against missing environment keys — falls back to dev config
  env_sizing = try(local.sizing[var.environment], local.sizing["dev"])

  # ── BOOLEAN FLAGS ──────────────────────────────────────────────────────────

  # Single truth for "is this production" — used in multiple lifecycle rules and conditions
  is_prod = var.environment == "prod"

  # ── TAGS ───────────────────────────────────────────────────────────────────

  common_tags = {
    Project     = var.project
    Environment = var.environment
    ManagedBy   = "Terraform"
    Team        = var.team
  }

  # ── DERIVED ARNS AND IDS ───────────────────────────────────────────────────

  # Build the KMS key ARN from components — avoids string concatenation in resource blocks
  kms_key_arn = "arn:aws:kms:${var.region}:${data.aws_caller_identity.current.account_id}:alias/${local.name_prefix}-key"
}

# In main.tf — locals make resource blocks clean and intention-revealing
resource "aws_autoscaling_group" "web" {
  min_size         = local.env_sizing.min_size   # Not a ternary, not a variable lookup — a local
  max_size         = local.env_sizing.max_size
  # ...

  lifecycle {
    prevent_destroy = local.is_prod  # Clean — is this prod? If so, protect it.
  }
}

6 — What Belongs in Git and What Does Not

Always commit	Never commit	Case by case
All `.tf` files	terraform.tfstate	terraform.tfvars (if no secrets)
.terraform.lock.hcl	*.tfstate.backup	backend-dev.hcl (if no credentials)
.gitignore	*.tfplan files	*.auto.tfvars
README.md	.terraform/ directory
moved blocks and import blocks	Any file with real credentials

7 — The Pre-Apply Checklist

Every engineer who has caused a production incident with Terraform skipped at least one of these steps. Run through the checklist mentally before every apply, and literally before any apply to prod.

Verify the active workspace and backend

Run terraform workspace show and confirm the output matches where you intend to deploy. Check the backend key in versions.tf. The wrong workspace or the wrong backend key is the single most common cause of accidental production changes.

Read every line of the plan output

The plan is a contract. Every -/+ means a resource will be destroyed and recreated. Every - means a resource is being destroyed. Understand every line before typing yes. If the plan is 400 lines long, read all 400 lines — or investigate why it is so long before applying.

Confirm destroy count is expected

The plan summary line shows X to add, Y to change, Z to destroy. If Z is greater than zero and you were not expecting any destroys, stop and investigate before proceeding. Unexpected destroys are the most common source of production incidents.

Check that sensitive values are redacted

Any variable or resource attribute that contains credentials, passwords, or private keys should appear as (sensitive value) in the plan output. If you see a real password printed in plain text in the plan, the sensitive marking is missing — do not apply until it is fixed.

Save the plan and apply the saved plan

For any apply to a shared environment: terraform plan -out=tfplan then terraform apply tfplan. Applying a saved plan guarantees that exactly what you reviewed is what gets executed — no difference if state changed between plan and apply. In CI/CD this is mandatory. In manual applies to shared environments it is strongly recommended.

8 — Six Habits of Effective Terraform Engineers

Run terraform fmt before every commit

terraform fmt -recursive formats all .tf files consistently — correct indentation, aligned = signs, standardised spacing. Run it in a pre-commit hook or CI check. Unformatted code in PRs wastes reviewer time on style rather than substance.

Run terraform validate in CI before plan

terraform validate catches syntax errors, type mismatches, and missing required arguments without making any API calls. It runs in under a second. Put it before plan in every CI pipeline so syntax errors fail fast before a slow plan even starts.

Keep resource blocks under 50 lines

If a resource block is getting long — 60, 80, 100 lines — it is usually a sign that some logic belongs in locals, some arguments belong in a separate dependent resource, or the resource should be extracted into a module. Long resource blocks are hard to review and hard to reason about.

Never use count for resources that can have items removed from non-tail positions

You learned this in Lesson 22 — and it is worth restating as a habit. If a list can have items removed from anywhere but the end, use for_each. Index instability with count causes unintended destroys that are hard to explain and harder to recover from.

Write descriptions on every variable and output

The description field on variable and output blocks appears in terraform console, in generated documentation, and in error messages when required variables are not provided. A variable with no description forces anyone using the module to read the source code to understand what it does. Write the description as if you will not be available to explain it.

Test in dev before prod — always, with no exceptions

Every configuration change — including "trivial" tag updates and "obvious" security group rule additions — goes through dev and staging before prod. The incident report for every Terraform production issue ever written contains the phrase "it seemed like a small change." The pre-apply checklist prevents this, but only if the habit is genuinely consistent.

The single most important Terraform best practice

Remote state in a versioned, encrypted, access-controlled backend with DynamoDB locking. Everything else in this lesson makes Terraform better. Without this one, Terraform is an accident waiting to happen the moment a second person joins the project. If you only implement one thing from this lesson: move all project state to a remote backend today.

Practice Questions

1. Which file records exact provider version hashes and should always be committed to Git despite being in the .terraform/ directory area?

2. Which command formats all .tf files in the current directory and all subdirectories consistently?

3. Which version constraint operator allows patch and minor updates within a major version — for example, >= 5.0, < 6.0 — and is the recommended operator for provider version pinning?

Quiz

Section II Complete

Terraform Core Concepts

You have covered all fifteen core concept lessons — variables, outputs, data sources, state, remote backends, locking, state commands, import, lifecycle, dependencies, count and for_each, dynamic blocks, workspaces, and best practices. You now have the full foundation.

Coming up in Section III — Modules, Security and Cloud Providers

Lessons 26–35 cover building reusable modules, the Terraform module registry, module composition patterns, security hardening with Sentinel and OPA, and provider-specific deep dives into AWS, Azure, and GCP.

Next up → Lesson 26: Introduction to Modules

← Previous Course Index Next →

Terraform Course