Ansible Lesson 15 – Facts and Gathering Facts | Dataplexa
Section II · Lesson 15

Facts and Gathering Facts

In this lesson

What facts are Gathering & filtering Essential fact variables Facts in conditions Custom facts

Facts are variables that Ansible automatically collects from each managed node at the start of every playbook run. They describe the current state of the system — its operating system, hostname, IP addresses, CPU count, memory, disk layout, network interfaces, and hundreds of other properties. Facts are what allow a single playbook to behave intelligently across different servers: installing the right package manager for the detected OS, using the correct network interface IP, or skipping tasks that do not apply to a particular architecture. Understanding what facts exist and how to use them is one of the most practical skills in this course.

How Fact Gathering Works

Before the first task in a play runs, Ansible automatically executes the ansible.builtin.setup module on every targeted host. This is the Gathering Facts step you see at the top of every playbook run output. The module returns a large JSON object containing hundreds of key-value pairs — these become available as variables throughout the entire play.

Phase 1

Ansible connects to each managed node

SSH connections are opened to all hosts in the play's target group, in parallel up to the configured fork count.

Phase 2

The setup module runs on each host

Ansible transfers and executes the setup module, which interrogates the OS, hardware, network stack, Python environment, and dozens of other subsystems to build the facts dictionary.

Phase 3

Facts are returned to the control node

The control node receives the full facts dictionary for each host. Facts are host-specific — ansible_hostname returns a different value for each server in the play.

Phase 4

Facts are available as variables for the rest of the play

Every task in the play can reference facts using the {{ ansible_* }} namespace. Facts are also accessible in templates, conditionals, and loop expressions.

View all facts for a host with an ad-hoc command

# Collect and display all facts from a single host
ansible web01.example.com -m ansible.builtin.setup

# Filter to just network facts
ansible web01.example.com -m ansible.builtin.setup -a "filter=ansible_default_ipv4"

# Filter using a wildcard pattern
ansible web01.example.com -m ansible.builtin.setup -a "filter=ansible_*_mb"
web01.example.com | SUCCESS => {
    "ansible_facts": {
        "ansible_default_ipv4": {
            "address": "192.168.1.10",
            "alias": "eth0",
            "broadcast": "192.168.1.255",
            "gateway": "192.168.1.1",
            "interface": "eth0",
            "macaddress": "52:54:00:ab:cd:ef",
            "mtu": 1500,
            "netmask": "255.255.255.0",
            "network": "192.168.1.0",
            "type": "ether"
        }
    },
    "changed": false
}

What just happened?

The filter argument narrowed the output to just the default IPv4 interface facts. Without a filter, the full output is hundreds of lines. Use filters when you know which fact you are looking for — or dump everything on a new host type to explore what is available.

Essential Facts to Know

Out of the hundreds of available facts, a core set appears in almost every real-world playbook. These are the facts you will reference most frequently for conditionals, templates, and dynamic configuration.

🖥️

ansible_distribution / ansible_distribution_version

The OS name and version. Values like "Ubuntu", "CentOS", "Debian" and "22.04", "8". The most commonly used facts for conditional package management and OS-specific configuration.

🏠

ansible_os_family

Broad OS family grouping — "Debian" for Ubuntu/Debian, "RedHat" for RHEL/CentOS/Fedora. Use this when you need to handle two families rather than checking every specific distribution.

🌐

ansible_default_ipv4.address

The primary IPv4 address of the managed node. Invaluable for generating configuration files that need the server's own IP — load balancer configs, application bind addresses, monitoring agent configs.

📛

ansible_hostname / ansible_fqdn

Short hostname (web01) and fully qualified domain name (web01.example.com). Used in config file generation, log file naming, and SSL certificate subject names.

💾

ansible_memtotal_mb / ansible_processor_vcpus

Total RAM in megabytes and vCPU count. Used to size application configuration dynamically — JVM heap, worker process count, database buffer pool — so the same playbook configures resources proportionally on small and large servers.

🏗️

ansible_architecture

CPU architecture — "x86_64", "aarch64", "armv7l". Critical when downloading binaries or packages where the download URL varies by architecture.

🐍

ansible_python_version

The Python version on the managed node. Useful when your application or tasks have Python version requirements — use this fact to gate tasks or fail early with a clear error message.

The Medical Chart Analogy

Facts are like a patient's medical chart that a doctor reads before prescribing treatment. The chart (facts) tells the doctor exactly what they are working with — age, weight, allergies, current medications — so the treatment (playbook tasks) can be tailored precisely rather than applied generically. A playbook without facts is like a doctor prescribing without reading the chart: technically possible, but dangerously imprecise.

Using Facts in Tasks and Templates

Facts integrate naturally into tasks and Jinja2 templates using the same {{ variable }} syntax as regular variables. The examples below show the most common patterns you will encounter in real playbooks.

Pattern 1 — OS-conditional tasks

- name: Install Nginx on Debian-based systems
  ansible.builtin.apt:
    name: nginx
    state: present
  when: ansible_os_family == "Debian"

- name: Install Nginx on RedHat-based systems
  ansible.builtin.dnf:
    name: nginx
    state: present
  when: ansible_os_family == "RedHat"

# Or more elegantly — use the generic package module with a fact-driven condition:
- name: Fail early if OS is not supported
  ansible.builtin.fail:
    msg: "This playbook only supports Ubuntu 20.04 and 22.04"
  when:
    - ansible_distribution != "Ubuntu"
    - ansible_distribution_major_version not in ["20", "22"]

Pattern 2 — Dynamic configuration based on hardware facts

- name: Configure Nginx worker processes based on CPU count
  ansible.builtin.template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf

# In nginx.conf.j2:
# worker_processes {{ ansible_processor_vcpus }};
# This renders as "worker_processes 4;" on a 4-core server
# and "worker_processes 8;" on an 8-core server — automatically

Pattern 3 — Computing derived values from facts

- name: Set JVM heap size to 50% of total RAM
  ansible.builtin.set_fact:
    jvm_heap_mb: "{{ (ansible_memtotal_mb * 0.5) | int }}"
    # On a 4096 MB server: jvm_heap_mb = 2048
    # On an 8192 MB server: jvm_heap_mb = 4096
    # The same playbook configures both correctly

- name: Deploy JVM configuration
  ansible.builtin.template:
    src: jvm.options.j2
    dest: /etc/app/jvm.options
  # jvm.options.j2 contains: -Xmx{{ jvm_heap_mb }}m

Controlling Fact Gathering

Fact gathering adds a few seconds to every play — on a large fleet this compounds quickly. Ansible gives you precise control over when and how facts are collected, so you can optimise playbook performance without losing the facts you need.

Enable gathering (default)
gather_facts: true in the play
Use when tasks rely on OS, network, or hardware facts
Always enable for plays that configure services or install packages
Disable gathering (performance)
gather_facts: false in the play
Use for health checks, simple API calls, or local plays on the control node
Saves 1–5 seconds per host on large fleets
# Disable gathering entirely for a fast utility play
- name: Check service health across fleet
  hosts: all
  gather_facts: false              # no facts needed — just checking service status
  tasks:
    - name: Check Nginx is running
      ansible.builtin.service_facts:
      register: service_state

    - name: Report Nginx status
      ansible.builtin.debug:
        msg: "Nginx is {{ service_state.ansible_facts.services['nginx.service'].state }}"

---
# Gather only specific fact subsets to save time
- name: Deploy with minimal fact collection
  hosts: webservers
  gather_facts: true
  gather_subset:
    - "!all"           # disable all facts
    - "network"        # then re-enable only network facts
    - "hardware"       # and hardware facts
  tasks:
    - name: Use network and hardware facts only
      ansible.builtin.debug:
        msg: "{{ ansible_default_ipv4.address }} — {{ ansible_memtotal_mb }} MB RAM"

Custom Facts

Ansible lets you define your own custom facts on managed nodes — small scripts or static files placed in /etc/ansible/facts.d/ that are automatically collected alongside built-in facts during the Gathering Facts phase. Custom facts appear under the ansible_local namespace.

This is powerful for encoding environment-specific information that does not exist in the OS — the application version currently installed, the deployment environment name, the cluster role of a node, or any custom property your automation needs to know.

# Step 1 — deploy a custom facts file to the managed node
- name: Create custom facts directory
  ansible.builtin.file:
    path: /etc/ansible/facts.d
    state: directory
    mode: "0755"

- name: Deploy application custom facts
  ansible.builtin.copy:
    content: |
      [application]
      name=myapp
      version=2.4.1
      environment=production
      deploy_user=deploy
    dest: /etc/ansible/facts.d/application.fact
    mode: "0644"

# Step 2 — refresh facts in the same play to make them available immediately
- name: Reload facts after deploying custom facts file
  ansible.builtin.setup:
    filter: ansible_local

# Step 3 — use the custom facts in subsequent tasks
- name: Log the deployed application version
  ansible.builtin.debug:
    msg: >
      Deployed {{ ansible_local.application.application.name }}
      version {{ ansible_local.application.application.version }}
      to {{ ansible_local.application.application.environment }}
TASK [Log the deployed application version] ***********************************
ok: [web01.example.com] => {
    "msg": "Deployed myapp version 2.4.1 to production"
}

What just happened?

The custom facts file was deployed to /etc/ansible/facts.d/application.fact, then facts were refreshed with ansible.builtin.setup so the new custom facts were available immediately — without waiting for the next playbook run. Custom facts are a lightweight way to persist metadata on managed nodes that Ansible can read back on every subsequent run.

Never Rely on Facts From a Different Host in a Task

Facts are host-specific — when Ansible runs a task on web01, ansible_hostname returns web01's hostname, not any other host's. A common mistake is trying to reference another host's facts directly in a task. If you need data from one host in a task running on another, use hostvars['other_host']['ansible_default_ipv4']['address'] — the hostvars magic variable gives access to any host's facts as long as that host has already had its facts gathered in the current playbook run.

Key Takeaways

Facts are collected automatically before the first task — the Gathering Facts step runs the setup module on every targeted host and makes hundreds of system variables available to the entire play.
ansible_os_family is the go-to fact for cross-distro playbooks — use it to branch between Debian and RedHat package managers without checking every specific distribution name.
Hardware facts enable truly adaptive configuration — use ansible_processor_vcpus and ansible_memtotal_mb to size application resources proportionally without hard-coding values per server.
Disable fact gathering for plays that do not need it — set gather_facts: false on utility plays, health checks, and local plays to eliminate unnecessary collection overhead on large fleets.
Custom facts in /etc/ansible/facts.d/ appear under ansible_local — use them to persist environment-specific metadata on managed nodes that your automation can read back on every subsequent run.

Teacher's Note

Run ansible all -m ansible.builtin.setup | less against your lab inventory right now and scroll through the output. You will discover facts you did not know existed — and immediately think of three ways to use them. That moment of discovery is worth more than reading any list of fact names.

Practice Questions

1. Which fact variable groups Linux distributions into broad families like "Debian" and "RedHat" — making it the easiest way to branch between package managers?



2. Custom facts deployed to /etc/ansible/facts.d/ on a managed node are accessible under which Ansible fact namespace?



3. What play-level attribute and value disables automatic fact collection — useful for health-check plays that do not need any host information?



Quiz

1. A task running on web01 needs the IP address of db01 to write a database connection string. How do you access db01's facts inside a task running on web01?


2. A playbook targeting 200 hosts only needs network facts. Which approach balances fact availability with performance?


3. You want to set Nginx's worker_processes equal to the number of CPU cores on each server automatically. Which fact and approach is correct?


Up Next · Lesson 16

Conditionals and Loops

Learn to write tasks that branch and iterate — running different actions per OS, looping over lists of packages, and combining conditions with loops for precise, adaptive automation.