Ansible Course
Facts and Gathering Facts
In this lesson
Facts are variables that Ansible automatically collects from each managed node at the start of every playbook run. They describe the current state of the system — its operating system, hostname, IP addresses, CPU count, memory, disk layout, network interfaces, and hundreds of other properties. Facts are what allow a single playbook to behave intelligently across different servers: installing the right package manager for the detected OS, using the correct network interface IP, or skipping tasks that do not apply to a particular architecture. Understanding what facts exist and how to use them is one of the most practical skills in this course.
How Fact Gathering Works
Before the first task in a play runs,
Ansible automatically executes the
ansible.builtin.setup
module on every targeted host. This is the Gathering Facts step you see at
the top of every playbook run output. The module returns a large JSON object containing
hundreds of key-value pairs — these become available as variables throughout the entire
play.
Phase 1
Ansible connects to each managed node
SSH connections are opened to all hosts in the play's target group, in parallel up to the configured fork count.
Phase 2
The setup module runs on each host
Ansible transfers and executes the setup module, which interrogates the OS, hardware, network stack, Python environment, and dozens of other subsystems to build the facts dictionary.
Phase 3
Facts are returned to the control node
The control
node receives the full facts dictionary for each host. Facts are host-specific —
ansible_hostname returns a different value for each server in
the play.
Phase 4
Facts are available as variables for the rest of the play
Every task
in the play can reference facts using the {{ ansible_* }} namespace.
Facts are also accessible in templates, conditionals, and loop expressions.
View all facts for a host with an ad-hoc command
# Collect and display all facts from a single host
ansible web01.example.com -m ansible.builtin.setup
# Filter to just network facts
ansible web01.example.com -m ansible.builtin.setup -a "filter=ansible_default_ipv4"
# Filter using a wildcard pattern
ansible web01.example.com -m ansible.builtin.setup -a "filter=ansible_*_mb"
web01.example.com | SUCCESS => {
"ansible_facts": {
"ansible_default_ipv4": {
"address": "192.168.1.10",
"alias": "eth0",
"broadcast": "192.168.1.255",
"gateway": "192.168.1.1",
"interface": "eth0",
"macaddress": "52:54:00:ab:cd:ef",
"mtu": 1500,
"netmask": "255.255.255.0",
"network": "192.168.1.0",
"type": "ether"
}
},
"changed": false
}What just happened?
The filter argument narrowed the
output to just the default IPv4 interface facts. Without a filter, the full output
is hundreds of lines. Use filters when you know which fact you are looking for —
or dump everything on a new host type to explore what is available.
Essential Facts to Know
Out of the hundreds of available facts, a core set appears in almost every real-world playbook. These are the facts you will reference most frequently for conditionals, templates, and dynamic configuration.
ansible_distribution /
ansible_distribution_version
The OS name and version. Values like "Ubuntu",
"CentOS", "Debian" and "22.04",
"8". The most commonly used facts for conditional package
management and OS-specific configuration.
ansible_os_family
Broad OS family grouping — "Debian" for Ubuntu/Debian,
"RedHat" for RHEL/CentOS/Fedora. Use this when you need
to handle two families rather than checking every specific distribution.
ansible_default_ipv4.address
The primary IPv4 address of the managed node. Invaluable for generating configuration files that need the server's own IP — load balancer configs, application bind addresses, monitoring agent configs.
ansible_hostname / ansible_fqdn
Short hostname (web01) and fully qualified domain name
(web01.example.com). Used in config file generation,
log file naming, and SSL certificate subject names.
ansible_memtotal_mb / ansible_processor_vcpus
Total RAM in megabytes and vCPU count. Used to size application configuration dynamically — JVM heap, worker process count, database buffer pool — so the same playbook configures resources proportionally on small and large servers.
ansible_architecture
CPU architecture — "x86_64", "aarch64",
"armv7l". Critical when downloading binaries or packages where
the download URL varies by architecture.
ansible_python_version
The Python version on the managed node. Useful when your application or tasks have Python version requirements — use this fact to gate tasks or fail early with a clear error message.
The Medical Chart Analogy
Facts are like a patient's medical chart that a doctor reads before prescribing treatment. The chart (facts) tells the doctor exactly what they are working with — age, weight, allergies, current medications — so the treatment (playbook tasks) can be tailored precisely rather than applied generically. A playbook without facts is like a doctor prescribing without reading the chart: technically possible, but dangerously imprecise.
Using Facts in Tasks and Templates
Facts integrate naturally into tasks and
Jinja2 templates using the same {{ variable }} syntax as regular variables.
The examples below show the most common patterns you will encounter in real playbooks.
Pattern 1 — OS-conditional tasks
- name: Install Nginx on Debian-based systems
ansible.builtin.apt:
name: nginx
state: present
when: ansible_os_family == "Debian"
- name: Install Nginx on RedHat-based systems
ansible.builtin.dnf:
name: nginx
state: present
when: ansible_os_family == "RedHat"
# Or more elegantly — use the generic package module with a fact-driven condition:
- name: Fail early if OS is not supported
ansible.builtin.fail:
msg: "This playbook only supports Ubuntu 20.04 and 22.04"
when:
- ansible_distribution != "Ubuntu"
- ansible_distribution_major_version not in ["20", "22"]
Pattern 2 — Dynamic configuration based on hardware facts
- name: Configure Nginx worker processes based on CPU count
ansible.builtin.template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
# In nginx.conf.j2:
# worker_processes {{ ansible_processor_vcpus }};
# This renders as "worker_processes 4;" on a 4-core server
# and "worker_processes 8;" on an 8-core server — automatically
Pattern 3 — Computing derived values from facts
- name: Set JVM heap size to 50% of total RAM
ansible.builtin.set_fact:
jvm_heap_mb: "{{ (ansible_memtotal_mb * 0.5) | int }}"
# On a 4096 MB server: jvm_heap_mb = 2048
# On an 8192 MB server: jvm_heap_mb = 4096
# The same playbook configures both correctly
- name: Deploy JVM configuration
ansible.builtin.template:
src: jvm.options.j2
dest: /etc/app/jvm.options
# jvm.options.j2 contains: -Xmx{{ jvm_heap_mb }}m
Controlling Fact Gathering
Fact gathering adds a few seconds to every play — on a large fleet this compounds quickly. Ansible gives you precise control over when and how facts are collected, so you can optimise playbook performance without losing the facts you need.
gather_facts: true in the playgather_facts: false in the play# Disable gathering entirely for a fast utility play
- name: Check service health across fleet
hosts: all
gather_facts: false # no facts needed — just checking service status
tasks:
- name: Check Nginx is running
ansible.builtin.service_facts:
register: service_state
- name: Report Nginx status
ansible.builtin.debug:
msg: "Nginx is {{ service_state.ansible_facts.services['nginx.service'].state }}"
---
# Gather only specific fact subsets to save time
- name: Deploy with minimal fact collection
hosts: webservers
gather_facts: true
gather_subset:
- "!all" # disable all facts
- "network" # then re-enable only network facts
- "hardware" # and hardware facts
tasks:
- name: Use network and hardware facts only
ansible.builtin.debug:
msg: "{{ ansible_default_ipv4.address }} — {{ ansible_memtotal_mb }} MB RAM"
Custom Facts
Ansible lets you define your own
custom facts
on managed nodes — small scripts or static files placed in
/etc/ansible/facts.d/ that are automatically collected alongside built-in
facts during the Gathering Facts phase. Custom facts appear under the
ansible_local namespace.
This is powerful for encoding environment-specific information that does not exist in the OS — the application version currently installed, the deployment environment name, the cluster role of a node, or any custom property your automation needs to know.
# Step 1 — deploy a custom facts file to the managed node
- name: Create custom facts directory
ansible.builtin.file:
path: /etc/ansible/facts.d
state: directory
mode: "0755"
- name: Deploy application custom facts
ansible.builtin.copy:
content: |
[application]
name=myapp
version=2.4.1
environment=production
deploy_user=deploy
dest: /etc/ansible/facts.d/application.fact
mode: "0644"
# Step 2 — refresh facts in the same play to make them available immediately
- name: Reload facts after deploying custom facts file
ansible.builtin.setup:
filter: ansible_local
# Step 3 — use the custom facts in subsequent tasks
- name: Log the deployed application version
ansible.builtin.debug:
msg: >
Deployed {{ ansible_local.application.application.name }}
version {{ ansible_local.application.application.version }}
to {{ ansible_local.application.application.environment }}
TASK [Log the deployed application version] ***********************************
ok: [web01.example.com] => {
"msg": "Deployed myapp version 2.4.1 to production"
}What just happened?
The custom facts file was deployed to
/etc/ansible/facts.d/application.fact, then facts were refreshed
with ansible.builtin.setup so the new custom facts were available
immediately — without waiting for the next playbook run. Custom facts are a
lightweight way to persist metadata on managed nodes that Ansible can read back
on every subsequent run.
Never Rely on Facts From a Different Host in a Task
Facts are host-specific — when
Ansible runs a task on web01, ansible_hostname returns
web01's hostname, not any other host's. A common mistake is trying
to reference another host's facts directly in a task. If you need data from one
host in a task running on another, use
hostvars['other_host']['ansible_default_ipv4']['address'] — the
hostvars magic variable gives access to any host's facts as long
as that host has already had its facts gathered in the current playbook run.
Key Takeaways
setup module on every targeted host
and makes hundreds of system variables available to the entire play.
ansible_os_family is the go-to fact for cross-distro
playbooks — use it to branch between Debian and RedHat package
managers without checking every specific distribution name.
ansible_processor_vcpus and ansible_memtotal_mb
to size application resources proportionally without hard-coding values
per server.
gather_facts: false on utility plays, health checks, and local
plays to eliminate unnecessary collection overhead on large fleets.
/etc/ansible/facts.d/ appear under
ansible_local — use them to persist environment-specific
metadata on managed nodes that your automation can read back on every
subsequent run.
Teacher's Note
Run
ansible all -m ansible.builtin.setup | less against your lab inventory
right now and scroll through the output. You will discover facts you did not know
existed — and immediately think of three ways to use them. That moment of discovery
is worth more than reading any list of fact names.
Practice Questions
1. Which fact variable groups Linux
distributions into broad families like "Debian" and
"RedHat" — making it the easiest way to branch between package
managers?
2. Custom facts deployed to
/etc/ansible/facts.d/ on a managed node are accessible under
which Ansible fact namespace?
3. What play-level attribute and value disables automatic fact collection — useful for health-check plays that do not need any host information?
Quiz
1. A task running on web01
needs the IP address of db01 to write a database connection string.
How do you access db01's facts inside a task running on
web01?
2. A playbook targeting 200 hosts only needs network facts. Which approach balances fact availability with performance?
3. You want to set Nginx's
worker_processes equal to the number of CPU cores on each server
automatically. Which fact and approach is correct?
Up Next · Lesson 16
Conditionals and Loops
Learn to write tasks that branch and iterate — running different actions per OS, looping over lists of packages, and combining conditions with loops for precise, adaptive automation.