Ansible Lesson 35 – Performance Optimization | Dataplexa

Section III · Lesson 35

Performance Optimization

In this lesson

SSH pipelining Fact caching Async tasks Profiling slow plays Forks & batching

Performance optimization matters the moment your fleet grows beyond a handful of hosts. A playbook that takes 90 seconds on 5 servers takes 30 minutes on 100 servers if nothing is tuned — not because Ansible is slow, but because its defaults are conservative. Ansible is designed to be safe and correct first; fast second. Tuning it means deliberately relaxing those conservative defaults where safety is not the concern, and structuring playbooks to eliminate unnecessary work. This lesson covers every meaningful performance lever — SSH pipelining, fact caching, forks, async tasks, and per-task profiling — with the expected impact of each and the configuration required to enable it.

Why Playbooks Are Slow — The Root Causes

🐌

SSH overhead per task

Default Ansible opens a new SSH connection for every task. At ~150ms per connection, a 20-task playbook on 50 hosts spends 2.5 minutes just on SSH handshakes.

🐌

Fact gathering on every run

The setup module takes 1–3 seconds per host. On 200 hosts that is 3–10 minutes of pure overhead before the first real task runs — even if no facts are used.

🐌

Low fork count

The default of 5 forks means Ansible processes 5 hosts at a time. On 200 hosts every task runs in 40 serial batches — even on a 32-core control node sitting idle.

The Formula

Total run time ≈ (tasks × SSH overhead per task × hosts / forks) + (fact gathering time × hosts) + (actual task execution time). The first two terms are almost entirely eliminable with pipelining and fact caching. The third term is the work you actually need to do. Good performance tuning makes the first two terms negligible so total time approaches actual work time.

SSH Pipelining — Highest Impact, Zero Risk

SSH pipelining eliminates the separate SSH connection that Ansible normally opens per task to upload the module file. Instead, it streams the module code over the existing connection. This reduces per-task SSH overhead from ~150ms to ~30ms — a 5× improvement that compounds across every task on every host.

# ansible.cfg — enable pipelining
[ssh_connection]
pipelining = True

# Prerequisite: requiretty must be disabled in /etc/sudoers on managed nodes
# Add this line to sudoers (via visudo):
# Defaults !requiretty
# This is already the default on most modern distributions (Ubuntu 20+, RHEL 8+)

Without pipelining

~150ms

per task per host (SSH open → upload → run → close)

With pipelining

~30ms

per task per host (stream over existing connection)

On a 20-task playbook × 100 hosts ÷ 20 forks: pipelining saves approximately 24 minutes of SSH handshake overhead.

Fact Caching

Fact caching stores gathered facts on the control node and reuses them on subsequent runs within a configurable timeout window. CI pipelines that run multiple playbooks against the same hosts within an hour pay the fact-gathering cost once instead of on every run.

# ansible.cfg — enable fact caching
[defaults]
fact_caching            = jsonfile
fact_caching_connection = /tmp/ansible_facts_cache
fact_caching_timeout    = 3600    # cache facts for 1 hour (seconds)

# Alternative: redis backend for shared caches across multiple control nodes
# fact_caching            = redis
# fact_caching_connection = localhost:6379:0

# Per-play: disable fact gathering for plays that don't use any facts
- name: Rotate application logs
  hosts: appservers
  gather_facts: false    # saves 1-3s per host — only skip if no ansible_* vars used
  tasks:
    - name: Find old log files
      ansible.builtin.find:
        paths: /var/log/app
        age: 30d
        recurse: true
      register: old_logs

    - name: Remove old log files
      ansible.builtin.file:
        path: "{{ item.path }}"
        state: absent
      loop: "{{ old_logs.files }}"

# Per-play: use cached facts instead of re-gathering
- name: Configure servers using cached facts
  hosts: webservers
  gather_facts: true     # will use cache if available and not expired
  tasks:
    - name: Set shared_buffers based on cached RAM fact
      ansible.builtin.template:
        src: postgresql.conf.j2
        dest: /etc/postgresql/15/main/postgresql.conf
      # ansible_memtotal_mb comes from the cached fact — no SSH round trip

Forks — Parallel Host Processing

Forks control how many hosts Ansible processes in parallel per task. The default of 5 is deliberately conservative — appropriate for a laptop on a slow network, not for a CI server targeting a large fleet. The right value depends on control node CPU count and network capacity.

# ansible.cfg
[defaults]
forks = 20    # process 20 hosts in parallel per task

# Rule of thumb:
# Small fleet (< 50 hosts):   forks = 10-20
# Medium fleet (50-200):      forks = 20-50
# Large fleet (200+ hosts):   forks = 50-100 (test and tune — CPU/network bound)
# Never set forks higher than your control node's CPU core count × 4

# Before tuning: forks=5, pipelining=False, 100 hosts, 15 tasks
real    32m14s

# After tuning: forks=20, pipelining=True, fact caching enabled
real     4m38s

# Impact breakdown:
# forks 5→20:        ~4x parallel throughput
# pipelining:        ~5x reduction in per-task SSH overhead
# fact caching:      ~2m saved (100 hosts × 1.2s avg gather time)

What just happened?

Three configuration changes — pipelining, increased forks, and fact caching — reduced a 32-minute run to under 5 minutes on the same hardware and the same playbook. None of these changes modified the playbook logic or reduced the work done. They only eliminated overhead.

Async Tasks — Fire and Forget Long Operations

Some tasks take a long time — package installs, database backups, large file transfers. By default Ansible blocks the connection for the full duration. async runs the task in the background on the managed node and lets Ansible poll for completion, freeing the connection for other work or other hosts.

# Fire-and-forget with polling — best for long single tasks
- name: Run database backup (takes up to 10 minutes)
  ansible.builtin.command:
    cmd: /usr/local/bin/backup-db.sh
  async: 600        # task is allowed up to 600 seconds (10 min) to complete
  poll: 30          # Ansible checks back every 30 seconds
  register: backup_job

# Run multiple slow tasks in parallel across the same host
- name: Upgrade packages on all hosts (async — don't wait)
  ansible.builtin.package:
    name: "*"
    state: latest
  async: 300
  poll: 0           # poll: 0 = fire and forget, don't wait at all
  register: upgrade_jobs

- name: Wait for all upgrades to complete
  ansible.builtin.async_status:
    jid: "{{ item.ansible_job_id }}"
  loop: "{{ upgrade_jobs.results }}"
  register: upgrade_results
  until: upgrade_results.finished
  retries: 30
  delay: 10

# Native async — run slow tasks across hosts in parallel
- name: Download large artifact on all hosts simultaneously
  ansible.builtin.get_url:
    url: "https://releases.example.com/app-{{ app_version }}.tar.gz"
    dest: "/tmp/app-{{ app_version }}.tar.gz"
  async: 120
  poll: 0
  register: download_jobs

- name: Wait for all downloads to finish
  ansible.builtin.async_status:
    jid: "{{ item.ansible_job_id }}"
  loop: "{{ download_jobs.results }}"
  register: download_results
  until: download_results.finished
  retries: 20
  delay: 10

Profiling with Callback Plugins

Before optimising, measure. The profile_tasks and timer callback plugins print per-task timing at the end of every run — showing exactly which tasks are consuming the most time and therefore where optimisation effort has the highest return.

# ansible.cfg — enable timing callbacks
[defaults]
callbacks_enabled = timer, profile_tasks

# timer:         prints total playbook run time at the end
# profile_tasks: prints per-task timing sorted by duration

Thursday 21 November 2024  14:32:11 +0000 (0:04:38.122)  0:04:38.122 ********

Playbook Run Time ============================================================
  site.yml:

  Play: Configure web servers
  Task                                          Hosts  Time
  -----------------------------------------------------------------------
  Gathering Facts                               20     0:01:43.221  <-- !!!
  Install required packages                     20     0:01:12.043
  Deploy Nginx configuration                    20     0:00:18.401
  Create deploy user                            20     0:00:14.882
  Deploy application release                    20     0:00:54.210
  Ensure Nginx is started                       20     0:00:08.122
  -----------------------------------------------------------------------
  Total                                               0:04:38.122

What just happened?

profile_tasks immediately reveals that Gathering Facts takes 1m 43s — more than a third of total run time — across 20 hosts. This is the target: enable fact caching and eliminate this cost on subsequent runs. Without the profiler, it is easy to spend time optimising the wrong tasks. Always profile first, then optimise.

Native List Passing for Packages

Looping over packages with loop: installs them one at a time — one SSH round trip per package. Passing a list directly to the name: parameter installs all packages in a single module invocation. For 10 packages this is a 10× reduction in SSH operations for that task.

❌ Slow — one round trip per package

- name: Install packages
  ansible.builtin.package:
    name: "{{ item }}"
    state: present
  loop:
    - nginx
    - git
    - curl
    - python3
    - htop
  # 5 separate apt/dnf calls

✓ Fast — single package manager call

- name: Install packages
  ansible.builtin.package:
    name:
      - nginx
      - git
      - curl
      - python3
      - htop
    state: present
  # 1 apt/dnf call for all 5

The Optimised ansible.cfg

All performance settings in one place — a production-ready ansible.cfg that applies every optimisation from this lesson.

[defaults]
inventory               = ./inventory/production
remote_user             = ansible
forks                   = 20                    # parallel host processing
gathering               = smart                 # only gather facts if not cached
fact_caching            = jsonfile
fact_caching_connection = /tmp/ansible_facts_cache
fact_caching_timeout    = 3600                  # 1 hour cache
stdout_callback         = yaml                  # cleaner output
callbacks_enabled       = timer, profile_tasks  # per-task timing on every run

[ssh_connection]
pipelining              = True                  # stream modules over existing SSH
ssh_args                = -C -o ControlMaster=auto -o ControlPersist=60s
                          # -C: compression (helps on slow links)
                          # ControlMaster: reuse SSH connections across invocations
                          # ControlPersist: keep master connection open 60s

Performance Optimisation Summary

Optimisation reference — impact and effort

SSH pipelining High One line in ansible.cfg. Eliminates separate SSH connections per task. ~5× reduction in per-task overhead. Enable this first, always.

Increase forks High One line in ansible.cfg. Linear improvement up to control node CPU limit. Set to 20 as a safe default, tune upward from there.

Fact caching High Three lines in ansible.cfg. Eliminates 1–3s per host on subsequent runs. Highest impact when running multiple playbooks against the same fleet.

gather_facts: false Medium Per-play attribute. Saves fact-gathering overhead on plays that never use ansible_* variables. Always profile first to confirm no facts are needed.

Native list packages Medium Per-task change. Pass a list to name: instead of looping. Reduces N package installs to 1 package manager invocation per play.

Async long tasks Medium Per-task change on known slow operations. Allows slow tasks to run in background; Ansible connection freed for other hosts. Most valuable on operations over 30 seconds.

SSH ControlPersist Low One line in ansible.cfg. Keeps SSH master connections open between plays. Meaningful only when pipelining is already enabled and multiple plays target the same hosts.

Profile Before Optimising — Never Guess

Enable profile_tasks in callbacks_enabled and run the playbook once before making any changes. The profile output will tell you exactly where time is going. In most cases the answer is fact gathering or SSH overhead — both fixed in minutes. Do not optimise task logic, introduce async, or restructure plays until you have confirmed with data that those are actually the slow parts.

Key Takeaways

✓

Enable pipelining first — it is the highest impact, lowest risk change — one line in ansible.cfg, no playbook changes, ~5× reduction in per-task SSH overhead across every play.

✓

Set forks to at least 20 — the default of 5 is appropriate for development, not production. Tune upward based on control node CPU count and network capacity.

✓

Enable fact caching with gathering = smart — Ansible only re-gathers facts if the cache has expired or no cached facts exist, eliminating the setup module cost on repeated runs.

✓

Use profile_tasks on every run during performance investigation — the timer output tells you exactly which tasks to fix before you invest any effort in optimisation.

✓

Pass package lists directly to the module, not via loop — N packages via loop = N SSH operations; N packages via list = 1 operation. The most common playbook-level performance mistake.

Teacher's Note

Enable profile_tasks and run your lab playbook against 5+ hosts. Read the per-task timing output and identify the slowest task. Then enable pipelining and fact caching, run it again, and compare the two profiles side by side. The numbers in the profile output — not theory — are what will make these optimisations permanent habits.

Practice Questions

1. Which single ansible.cfg setting under [ssh_connection] eliminates the separate SSH connection Ansible normally opens per task to upload module code?

2. Which callback plugin prints per-task execution time at the end of every run, showing exactly which tasks are consuming the most time?

3. A play contains only find and file tasks that do not reference any ansible_* variables. Which play-level setting eliminates the 1–3 second fact-gathering overhead?

Up Next · Lesson 36

Ansible at Scale

Take Ansible from dozens of hosts to thousands — dynamic inventories, AWX and Ansible Tower, pull mode with ansible-pull, and architectural patterns for fleet management at enterprise scale.

← Previous Course Index Next →

Ansible Course