Ansible Course
Playbook Best Practices
In this lesson
Playbook best practices are the habits, conventions, and structural decisions that make the difference between automation that works once and automation that a team can maintain, extend, and trust for years. This lesson consolidates the most important practical lessons from Section II into a definitive reference — covering project structure at scale, naming conventions that make playbooks self-documenting, performance optimisations that matter on large fleets, security practices that prevent credential leaks, and the Section II recap that closes out the playbook fundamentals phase of this course. Nothing in this lesson is new theory — it is applied wisdom from everything you have built in Lessons 11 through 24.
Production Project Structure
A production Ansible project is larger than the practice structures used in Lessons 10 and 13. It needs to support multiple environments, a team of engineers, CI/CD integration, and a growing library of roles. The structure below scales from a two-person startup to a 50-engineer platform team without reorganisation.
myproject/
├── ansible.cfg # project-level config — committed to Git
├── site.yml # master playbook — provisions the full stack
├── deploy.yml # deployment-only playbook (no provisioning)
│
├── inventory/
│ ├── staging/
│ │ ├── hosts.ini # staging host list
│ │ └── group_vars/
│ │ ├── all.yml # env: staging, log_level: debug
│ │ ├── webservers.yml # staging nginx settings
│ │ └── databases.yml # staging postgresql settings
│ └── production/
│ ├── hosts.ini # production host list
│ └── group_vars/
│ ├── all.yml # env: production, log_level: warn
│ ├── webservers.yml # production nginx settings
│ └── databases.yml # production postgresql settings
│
├── roles/
│ ├── common/ # base config for every host
│ ├── nginx/ # web server role
│ ├── postgresql/ # database role
│ └── app_deploy/ # application deployment role
│
├── playbooks/
│ ├── hardening.yml # security hardening (run separately)
│ ├── monitoring.yml # monitoring agent setup
│ └── rotate_secrets.yml # credential rotation (tagged never)
│
├── files/ # static files used across roles
├── templates/ # project-level templates (rarely needed)
│
├── .vault_pass # NEVER COMMIT — listed in .gitignore
├── .gitignore # must include .vault_pass and *.retry
└── README.md # how to run this project — always maintained
Three non-negotiable project files
ansible.cfg in the project root
ensures every engineer uses the same configuration regardless of what is in their home
directory. .gitignore must list .vault_pass,
*.retry, and any other files that must never be committed.
README.md must be current — if it does not describe how to run the
project and which inventories exist, the next engineer will spend an hour figuring
out what you spent five minutes setting up.
Naming Conventions
Consistent naming is not cosmetic — it is
the difference between a playbook that documents itself and one that requires an expert
to interpret. These five conventions appear in every high-quality Ansible project and
are enforced by ansible-lint.
Task names are imperative verb phrases
Install Nginx web server, Deploy application config, Create deploy user. Every task name reads like an instruction. Never: nginx, package, or done.
Variables are lowercase_with_underscores
nginx_port, db_max_connections,
deploy_user. Never camelCase, never UPPER_CASE for user-defined
variables. Ansible's built-in facts use ansible_*.
Role variables are prefixed with the role name
nginx_port not port, postgresql_version
not version. This prevents name collisions when multiple roles are
applied in the same play.
Playbooks are named by function, not by host
deploy.yml, hardening.yml,
rotate_secrets.yml. Never web01.yml or
server.yml. Playbooks describe actions, not targets.
Template files use the destination filename plus .j2
The template for /etc/nginx/nginx.conf is named
nginx.conf.j2. The template for
/etc/postgresql/15/main/postgresql.conf is named
postgresql.conf.j2. The naming makes it immediately clear what the
template renders into, and it makes grep and IDE search useful.
Performance Habits
On a 5-host lab, performance rarely matters. On a 200-host production fleet, a 10-second-per-host overhead compounds to 33 minutes of waiting. These habits collectively cut playbook run times by 30–60% on large inventories with no change to correctness.
Habit 1
Enable SSH pipelining
Set
pipelining = True in ansible.cfg under
[ssh_connection]. This eliminates a separate SSH connection per task
by streaming commands over a single connection — reducing per-task overhead from
~150ms to ~30ms. Requires requiretty = False in sudoers, which is
the default on most modern distributions.
Habit 2
Disable fact gathering on plays that do not need facts
Set
gather_facts: false on any play whose tasks do not reference
ansible_* variables. Fact gathering takes 1–3 seconds per host —
on a 200-host fleet this is 3–10 minutes of pure overhead for plays that never
use a single fact.
Habit 3
Use native list passing for package installs
Pass a list
directly to ansible.builtin.package's name: parameter
rather than looping — installs all packages in a single module invocation
instead of one SSH operation per package. For 10 packages, this is 10× fewer
round trips.
Habit 4
Increase forks for large fleets
The default
fork count of 5 means Ansible processes 5 hosts at a time. Set
forks = 20 (or higher) in ansible.cfg for large
inventories. The right value depends on your control node's CPU count and
network bandwidth — test and tune, but 20 is a safe starting point for most
environments.
Habit 5
Cache facts with fact caching
Enable
fact_caching = jsonfile with a reasonable
fact_caching_timeout in ansible.cfg. Cached facts
eliminate the setup module round-trip on subsequent runs within the timeout
window — ideal for CI pipelines that run multiple playbooks against the same
hosts in sequence.
Performance-optimised ansible.cfg
[defaults]
inventory = ./inventory/production
remote_user = ansible
forks = 20 # parallel host count — tune to your infra
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts_cache
fact_caching_timeout = 3600 # cache facts for 1 hour
stdout_callback = yaml # cleaner output format
callbacks_enabled = timer, profile_tasks # shows per-task timing
[ssh_connection]
pipelining = True # single SSH connection per task
ssh_args = -C -o ControlMaster=auto -o ControlPersist=60s
# connection multiplexing
Security in Playbooks
Security is not a feature you add to automation — it is a property you lose if you are not deliberate about it from the start. These six practices are the minimum baseline for any Ansible project that handles real credentials or runs against production infrastructure.
Encrypt all secrets with Ansible Vault
Every
password, API key, and private key that must be committed to Git must be
encrypted with ansible-vault encrypt_string or stored in an
encrypted vault file. The vault password itself goes in .vault_pass
— which is listed in .gitignore and never committed. Covered in
depth in Lesson 28.
Use no_log: true on every task that handles credentials
Any task
whose arguments contain a password, token, or private key must set
no_log: true — otherwise the value appears in terminal output,
CI logs, and any log aggregation systems connected to your automation
pipeline.
Use SSH key authentication — never password authentication
Configure
private_key_file in ansible.cfg and disable SSH
password authentication on all managed nodes. Password-based SSH is vulnerable
to brute force and creates audit trail gaps. Key-based auth is both more secure
and more convenient in automation.
Validate config files before writing with validate:
Use the
validate parameter on template,
copy, and lineinfile tasks that modify critical
config files — /etc/sudoers, /etc/ssh/sshd_config,
nginx.conf. A syntax error in any of these files can lock you
out of the server entirely.
Apply least-privilege become
Set
become: false at the play level and only escalate to
become: true at the task level for tasks that genuinely need
root. Running an entire play as root when only 3 out of 20 tasks need it
violates least-privilege and increases the blast radius of any task error.
Require peer review for playbooks that run against production
Any playbook
that modifies production infrastructure should go through a pull request review
before it is run. The review forces a second pair of eyes on the
--check --diff output and catches logic errors that the author
missed. This is a process control, not a technical one — but it prevents more
incidents than any technical safeguard.
The Idiomatic Playbook
Every practice from Lessons 11 through 24 is embodied in the following playbook. It is not a complete production playbook — it is a reference showing correct usage of every major convention in context. Read it top to bottom as a checklist of habits.
---
# deploy.yml — application deployment playbook
# Usage: ansible-playbook deploy.yml -i inventory/production/ -e "version=2.4.1"
# Tags: config, deploy, services
- name: Deploy application to web fleet
hosts: webservers
become: false # least-privilege — only escalate per task
gather_facts: true # needed: ansible_fqdn, ansible_default_ipv4
serial: "25%" # rolling update — 25% of hosts at a time
max_fail_percentage: 0 # abort if any host fails
vars_files:
- vars/app.yml # non-secret app variables
- vars/secrets.yml # vault-encrypted credentials
pre_tasks:
- name: Verify minimum Ansible version
ansible.builtin.assert:
that: "ansible_version.full is version('2.14', '>=')"
msg: "Ansible 2.14+ required — found {{ ansible_version.full }}"
tags: always
roles:
- role: nginx
become: true # escalate for this role only
tags: [config, nginx]
tasks:
- name: Deploy application release archive
ansible.builtin.unarchive:
src: "releases/{{ version }}.tar.gz"
dest: "{{ app_deploy_dir }}/releases/{{ version }}"
remote_src: false
become: true
tags: deploy
- name: Update current symlink to new release
ansible.builtin.file:
src: "{{ app_deploy_dir }}/releases/{{ version }}"
dest: "{{ app_deploy_dir }}/current"
state: link
force: true
become: true
tags: deploy
notify: Restart application
- name: Verify application is healthy
ansible.builtin.uri:
url: "http://{{ ansible_default_ipv4.address }}:{{ app_port }}/health"
status_code: 200
register: health
retries: 5
delay: 6
until: health.status == 200
tags: deploy
post_tasks:
- name: Record successful deployment
ansible.builtin.lineinfile:
path: /var/log/deploy_history.log
line: "{{ ansible_date_time.iso8601 }} — {{ version }} — deployed by {{ lookup('env', 'USER') }}"
create: true
become: true
tags: always
handlers:
- name: Restart application
ansible.builtin.service:
name: myapp
state: restarted
become: true
tags: services
Section II Recap
You have completed Section II — Ansible
Playbooks. This section took you from a blank .yml file to a full
multi-role project with handlers, templates, loops, conditionals, tags, and error
handling. Here is what each lesson contributed to your toolkit.
Section II — what you now know
Never Run a New Playbook Directly Against Production Without --check --diff First
This is the single rule that,
if you follow it every time without exception, will prevent the most costly class
of automation incident. Always run ansible-playbook --check --diff
against a production-equivalent environment before applying any playbook change to
the real fleet. The diff output shows exactly what would change — on every file
Ansible manages. If the diff contains anything unexpected, stop. If the diff looks
correct, you may proceed with confidence. No exceptions, no matter how small the
change appears.
Key Takeaways
.gitignore that excludes secrets.
pre_tasks to assert preconditions — verify
Ansible version, required variables, and environment assumptions before
any task that could modify infrastructure runs.
Teacher's Note
Before starting Section III, audit one of your own playbooks against the idiomatic reference in this lesson. Count how many of the conventions it follows and list the ones it does not. Then spend 20 minutes bringing it up to standard. This exercise consolidates everything from Section II more effectively than any amount of re-reading.
Practice Questions
1. Which ansible.cfg
setting under [ssh_connection] reduces per-task SSH overhead by
streaming module execution over a single connection rather than opening a new
one per task?
2. In the idiomatic playbook, which play section runs assertion tasks — such as verifying the Ansible version — before any roles or regular tasks execute?
3. You are writing an Nginx role and need a variable that sets the listening port. Following role naming conventions, what should this variable be called?
Quiz
1. A play has 20 tasks but only 4 of them require root access. What is the most secure way to configure privilege escalation?
2. A playbook runs against 150 hosts but seems to process them slowly, one batch at a time. The control node has 16 CPUs and plenty of bandwidth. What is the most likely cause and fix?
3. A template task deploys
/etc/nginx/nginx.conf with validate: "nginx -t -c %s".
The template has a syntax error. What happens?
Up Next · Lesson 26 — Section III
Ansible Galaxy
Section III begins now. Discover Ansible Galaxy — the community hub for sharing and downloading roles and collections — and learn how to use, publish, and manage Galaxy content in your own projects.