Ansible Course
Service Management
In this lesson
Service management is the set of Ansible tasks that control the lifecycle of system services — starting and stopping them, enabling and disabling them at boot, reloading configuration without downtime, and deploying custom systemd unit files. In nearly every provisioning and deployment playbook, service management is the final step: packages get installed, configuration files get deployed, and then services get started or restarted to pick up the new state. Done well, service management in Ansible is both safe and precise — restarting exactly the services that need it, exactly when they need it, and never unnecessarily disrupting a running system.
service vs systemd — Choosing the Right Module
Ansible has two modules for managing
services:
ansible.builtin.service
and
ansible.builtin.systemd.
Both can start, stop, enable, and disable services — but they differ in scope and
portability. Knowing which to reach for saves debugging time.
The service Module — State Values
Like the file module,
service's behaviour is driven by its state parameter.
Understanding the difference between restarted and reloaded
is particularly important — the wrong choice can cause unnecessary downtime.
service / systemd state values
started
Ensures the service
is running. If already running, does nothing (ok). If stopped,
starts it (changed). The most commonly used state.
stopped
Ensures the service
is not running. If already stopped, does nothing. If running, stops it. Use for
decommissioning services or during maintenance windows.
restarted
Always
stops then starts the service — regardless of its current state. Causes a brief
downtime. Use in handlers when a config change requires a full process restart
to take effect.
reloaded
Sends SIGHUP to the
service — prompts it to re-read its configuration without stopping. Zero-downtime
config update. Only works for services that support graceful reload (Nginx, HAProxy,
sshd). Not all services support this.
- name: Ensure Nginx is running and enabled on boot
ansible.builtin.service:
name: nginx
state: started
enabled: true # start automatically after every reboot
- name: Stop and disable a legacy service
ansible.builtin.service:
name: apache2
state: stopped
enabled: false # do not start on boot either
# In a handler — restart is always triggered by a notify, not inline
handlers:
- name: Restart Nginx
ansible.builtin.service:
name: nginx
state: restarted # full restart — brief downtime
- name: Reload Nginx
ansible.builtin.service:
name: nginx
state: reloaded # zero-downtime config reload — preferred when supported
The Restaurant Kitchen Analogy
restarted is like closing the kitchen,
sending all the cooks home, and reopening from scratch — the menu changes take effect but
no orders can be served during the transition. reloaded is like handing the
cooks an updated menu while they continue cooking — the new items are available immediately
and not a single order is missed. Always prefer reloaded when the service
supports it. Only fall back to restarted when a full process restart is
genuinely required.
The systemd Module
The
systemd
module extends service with systemd-specific capabilities. The most
important addition is daemon_reload — essential after deploying a new
or modified unit file, because systemd must re-read its unit file cache before it
can manage the new service.
# After deploying a new or modified unit file, always daemon_reload first
- name: Reload systemd daemon to pick up new unit files
ansible.builtin.systemd:
daemon_reload: true
- name: Enable and start the application service
ansible.builtin.systemd:
name: myapp.service
state: started
enabled: true
# Manage a user-scoped service (not system-wide)
- name: Start user-level timer for the deploy user
ansible.builtin.systemd:
name: backup.timer
state: started
enabled: true
scope: user # user scope — not system scope
# Mask a service to prevent it from being started by any means
- name: Mask postfix to prevent accidental start
ansible.builtin.systemd:
name: postfix
masked: true # masked services cannot be started even manually
Deploying Custom Unit Files
A common pattern in application deployment is creating a custom systemd unit file for the application service. This is a three-step sequence that must always follow the same order — the deploy → reload → start pattern. Skipping or reordering any step produces a service that either does not exist in systemd's index or runs the wrong binary.
Step 1
Deploy the unit file with template or copy
- name: Deploy application systemd unit file
ansible.builtin.template:
src: myapp.service.j2
dest: /etc/systemd/system/myapp.service
owner: root
group: root
mode: "0644"
notify: Reload systemd and restart myapp
Step 2
Reload the systemd daemon (in a handler)
handlers:
- name: Reload systemd and restart myapp
ansible.builtin.systemd:
daemon_reload: true # tells systemd to re-read all unit files
- name: Start myapp service
ansible.builtin.systemd:
name: myapp.service
state: started
enabled: true
Step 3
The unit file — templates/myapp.service.j2
[Unit]
Description={{ app_name }} Application Service
After=network.target postgresql.service
Requires=postgresql.service
[Service]
Type=simple
User={{ app_user }}
Group={{ app_group }}
WorkingDirectory={{ deploy_dir }}/current
ExecStart={{ deploy_dir }}/current/bin/{{ app_name }} \
--port {{ app_port }} \
--config {{ deploy_dir }}/shared/config/app.yml
Restart=on-failure
RestartSec=5s
StandardOutput=journal
StandardError=journal
SyslogIdentifier={{ app_name }}
[Install]
WantedBy=multi-user.target
Querying Service State with service_facts
The
ansible.builtin.service_facts
module collects the current state of all system services and makes them available
as facts under ansible_facts.services. Use it to write conditional tasks
that respond to the actual running state of a service — rather than assuming it.
- name: Gather current service states
ansible.builtin.service_facts: # no arguments — collects all services
- name: Show nginx service state
ansible.builtin.debug:
msg: "Nginx is {{ ansible_facts.services['nginx.service'].state }}"
when: "'nginx.service' in ansible_facts.services"
# Conditional task — only restart if the service is currently running
- name: Restart app only if it is currently active
ansible.builtin.service:
name: myapp
state: restarted
when:
- "'myapp.service' in ansible_facts.services"
- ansible_facts.services['myapp.service'].state == "running"
TASK [Gather current service states] ******************************************
ok: [web01.example.com]
TASK [Show nginx service state] ***********************************************
ok: [web01.example.com] => {
"msg": "Nginx is running"
}
TASK [Restart app only if it is currently active] *****************************
changed: [web01.example.com] <-- myapp was running, restarted successfullyWhat just happened?
service_facts populated
ansible_facts.services with a dictionary keyed by service unit names.
The when condition checked both that the service key exists and that
its state is running before restarting it — avoiding a failed restart
attempt on a host where the service was not installed or was already stopped.
Rolling Service Restarts
The scenario: A team runs a fleet of 20 Nginx web servers behind a load balancer. A new config template has been updated and needs to be deployed without taking the entire fleet offline simultaneously. The requirement: deploy to one server at a time, verify it is healthy before moving to the next, and abort the rolling update if any server fails the health check.
---
- name: Rolling Nginx config deployment
hosts: webservers
become: true
serial: 1 # process ONE host at a time — rolling update
max_fail_percentage: 0 # abort the entire play if ANY host fails
tasks:
- name: Deploy updated Nginx configuration
ansible.builtin.template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
validate: "nginx -t -c %s" # test config before replacing live file
backup: true
notify: Reload Nginx
- name: Force handler to run now (not at end of play)
ansible.builtin.meta: flush_handlers
- name: Wait for Nginx to be healthy on port 80
ansible.builtin.uri:
url: "http://{{ ansible_default_ipv4.address }}/health"
status_code: 200
timeout: 10
register: health_check
retries: 5
delay: 3
until: health_check.status == 200
handlers:
- name: Reload Nginx
ansible.builtin.service:
name: nginx
state: reloaded # zero-downtime reload — not restart
# Each host is processed one at a time — output shown for first two hosts PLAY [Rolling Nginx config deployment] **************************************** TASK [Deploy updated Nginx configuration] (web01) ***************************** changed: [web01.example.com] RUNNING HANDLERS [Rolling Nginx config deployment] **************************** changed: [web01.example.com] <-- Nginx reloaded on web01 TASK [Wait for Nginx to be healthy on port 80] (web01) ************************ ok: [web01.example.com] <-- health check passed TASK [Deploy updated Nginx configuration] (web02) ***************************** changed: [web02.example.com] RUNNING HANDLERS [Rolling Nginx config deployment] **************************** changed: [web02.example.com] <-- Nginx reloaded on web02 TASK [Wait for Nginx to be healthy on port 80] (web02) ************************ ok: [web02.example.com] <-- health check passed ...continues for remaining 18 hosts...
What just happened?
Three mechanics worked together: serial: 1
processed one host at a time, keeping 19 servers live while the 20th was being updated.
ansible.builtin.meta: flush_handlers forced the reload to happen immediately
rather than at the end of the play — so the health check ran against the newly reloaded
server. max_fail_percentage: 0 ensured the entire rolling update would abort
if any single host failed its health check, preventing a broken config from propagating
to the rest of the fleet.
Restart vs Reload — Decision Guide
Choosing between restarted
and reloaded is one of the most consequential decisions in service
management. Use this guide to pick the right one every time.
• The service binary itself was replaced (new version deployed)
• A change requires the process to re-initialise from scratch
• The service does not support graceful reload (SIGHUP)
• You need to clear in-memory state (caches, open file handles)
• A new unit file was deployed (after daemon_reload)
• Only a configuration file changed (not the binary)
• The service explicitly supports graceful reload
• Zero downtime is required (Nginx, HAProxy, sshd, rsyslog)
• Existing connections must not be dropped
• You are in a rolling update where only one server reloads at a time
Always daemon_reload After Deploying or Modifying a Unit File
systemd caches unit file
definitions in memory. If you deploy a new or modified .service file
and then immediately try to start or enable the service without running
daemon_reload: true, systemd will either use the old cached version
or throw a "Unit file changed on disk" warning. Always run
daemon_reload in a handler triggered by the unit file deployment task
— and make sure the handler fires before any task that tries to start the service.
Key Takeaways
service for portability and systemd for
systemd-specific features — daemon_reload, user scope, and
masking are only available in the systemd module.
reloaded over restarted for
config-only changes on services that support graceful reload — Nginx, HAProxy,
sshd, and rsyslog all support it and it causes zero downtime.
daemon_reload after deploying a unit file —
systemd will use a stale cached version otherwise. Run it in a handler triggered
by the unit file deployment task, before the service start task.
serial with max_fail_percentage: 0 for
rolling updates — process one server at a time and abort if any host
fails, keeping the rest of the fleet healthy while you investigate.
service_facts to make service restarts conditional
— checking that a service is actually running before restarting it avoids failed
tasks on hosts where the service was not yet installed or was intentionally stopped.
Teacher's Note
Write a handler for Nginx that uses
reloaded, run a playbook that changes the config, and watch Nginx pick up
the new config without dropping a single request. Then change it to restarted
and run it again while tailing the access log. The difference in behaviour will make the
reload vs restart distinction permanent in your memory.
Practice Questions
1. After deploying a new systemd unit
file, which ansible.builtin.systemd parameter must be set to
true before the service can be started?
2. Which play-level attribute controls how many hosts are processed at a time — enabling rolling deployments that update one server while the rest remain live?
3. After running
ansible.builtin.service_facts, under which variable namespace are
the collected service states available?
Quiz
1. What is the key operational difference
between state: reloaded and state: restarted?
2. In the rolling update playbook, why
was ansible.builtin.meta: flush_handlers used after the deploy task?
3. You are deploying a service config change to 50 production servers and need to ensure no more than one server is restarted at a time, with an immediate abort if any restart fails. Which play attributes achieve this?
Up Next · Lesson 20
User and Permission Management
Learn to manage users, groups, SSH keys, and file permissions at scale — the foundation of secure, auditable access control across your entire fleet.