Ansible Lesson 19 – Service Management | Dataplexa
Section II · Lesson 19

Service Management

In this lesson

service module systemd module service_facts Unit file deployment Rolling restarts

Service management is the set of Ansible tasks that control the lifecycle of system services — starting and stopping them, enabling and disabling them at boot, reloading configuration without downtime, and deploying custom systemd unit files. In nearly every provisioning and deployment playbook, service management is the final step: packages get installed, configuration files get deployed, and then services get started or restarted to pick up the new state. Done well, service management in Ansible is both safe and precise — restarting exactly the services that need it, exactly when they need it, and never unnecessarily disrupting a running system.

service vs systemd — Choosing the Right Module

Ansible has two modules for managing services: ansible.builtin.service and ansible.builtin.systemd. Both can start, stop, enable, and disable services — but they differ in scope and portability. Knowing which to reach for saves debugging time.

ansible.builtin.service
Works across systemd, init, upstart, and BSD init systems
Use when your playbook targets mixed OS environments
Limited to: start, stop, restart, reload, enable, disable
Best for cross-platform playbooks targeting diverse infrastructure
ansible.builtin.systemd
systemd-only — all modern Linux distros (RHEL 7+, Ubuntu 15.04+, Debian 8+)
Use when you need systemd-specific features
Adds: daemon_reload, scope (user/system), unit file management, masking
Best for modern Linux fleets where systemd is guaranteed

The service Module — State Values

Like the file module, service's behaviour is driven by its state parameter. Understanding the difference between restarted and reloaded is particularly important — the wrong choice can cause unnecessary downtime.

service / systemd state values

started Ensures the service is running. If already running, does nothing (ok). If stopped, starts it (changed). The most commonly used state.
stopped Ensures the service is not running. If already stopped, does nothing. If running, stops it. Use for decommissioning services or during maintenance windows.
restarted Always stops then starts the service — regardless of its current state. Causes a brief downtime. Use in handlers when a config change requires a full process restart to take effect.
reloaded Sends SIGHUP to the service — prompts it to re-read its configuration without stopping. Zero-downtime config update. Only works for services that support graceful reload (Nginx, HAProxy, sshd). Not all services support this.
- name: Ensure Nginx is running and enabled on boot
  ansible.builtin.service:
    name: nginx
    state: started
    enabled: true         # start automatically after every reboot

- name: Stop and disable a legacy service
  ansible.builtin.service:
    name: apache2
    state: stopped
    enabled: false        # do not start on boot either

# In a handler — restart is always triggered by a notify, not inline
handlers:
  - name: Restart Nginx
    ansible.builtin.service:
      name: nginx
      state: restarted    # full restart — brief downtime

  - name: Reload Nginx
    ansible.builtin.service:
      name: nginx
      state: reloaded     # zero-downtime config reload — preferred when supported

The Restaurant Kitchen Analogy

restarted is like closing the kitchen, sending all the cooks home, and reopening from scratch — the menu changes take effect but no orders can be served during the transition. reloaded is like handing the cooks an updated menu while they continue cooking — the new items are available immediately and not a single order is missed. Always prefer reloaded when the service supports it. Only fall back to restarted when a full process restart is genuinely required.

The systemd Module

The systemd module extends service with systemd-specific capabilities. The most important addition is daemon_reload — essential after deploying a new or modified unit file, because systemd must re-read its unit file cache before it can manage the new service.

# After deploying a new or modified unit file, always daemon_reload first
- name: Reload systemd daemon to pick up new unit files
  ansible.builtin.systemd:
    daemon_reload: true

- name: Enable and start the application service
  ansible.builtin.systemd:
    name: myapp.service
    state: started
    enabled: true

# Manage a user-scoped service (not system-wide)
- name: Start user-level timer for the deploy user
  ansible.builtin.systemd:
    name: backup.timer
    state: started
    enabled: true
    scope: user            # user scope — not system scope

# Mask a service to prevent it from being started by any means
- name: Mask postfix to prevent accidental start
  ansible.builtin.systemd:
    name: postfix
    masked: true           # masked services cannot be started even manually

Deploying Custom Unit Files

A common pattern in application deployment is creating a custom systemd unit file for the application service. This is a three-step sequence that must always follow the same order — the deploy → reload → start pattern. Skipping or reordering any step produces a service that either does not exist in systemd's index or runs the wrong binary.

Step 1

Deploy the unit file with template or copy

- name: Deploy application systemd unit file
  ansible.builtin.template:
    src: myapp.service.j2
    dest: /etc/systemd/system/myapp.service
    owner: root
    group: root
    mode: "0644"
  notify: Reload systemd and restart myapp

Step 2

Reload the systemd daemon (in a handler)

handlers:
  - name: Reload systemd and restart myapp
    ansible.builtin.systemd:
      daemon_reload: true    # tells systemd to re-read all unit files

  - name: Start myapp service
    ansible.builtin.systemd:
      name: myapp.service
      state: started
      enabled: true

Step 3

The unit file — templates/myapp.service.j2

[Unit]
Description={{ app_name }} Application Service
After=network.target postgresql.service
Requires=postgresql.service

[Service]
Type=simple
User={{ app_user }}
Group={{ app_group }}
WorkingDirectory={{ deploy_dir }}/current
ExecStart={{ deploy_dir }}/current/bin/{{ app_name }} \
    --port {{ app_port }} \
    --config {{ deploy_dir }}/shared/config/app.yml
Restart=on-failure
RestartSec=5s
StandardOutput=journal
StandardError=journal
SyslogIdentifier={{ app_name }}

[Install]
WantedBy=multi-user.target

Querying Service State with service_facts

The ansible.builtin.service_facts module collects the current state of all system services and makes them available as facts under ansible_facts.services. Use it to write conditional tasks that respond to the actual running state of a service — rather than assuming it.

- name: Gather current service states
  ansible.builtin.service_facts:         # no arguments — collects all services

- name: Show nginx service state
  ansible.builtin.debug:
    msg: "Nginx is {{ ansible_facts.services['nginx.service'].state }}"
  when: "'nginx.service' in ansible_facts.services"

# Conditional task — only restart if the service is currently running
- name: Restart app only if it is currently active
  ansible.builtin.service:
    name: myapp
    state: restarted
  when:
    - "'myapp.service' in ansible_facts.services"
    - ansible_facts.services['myapp.service'].state == "running"
TASK [Gather current service states] ******************************************
ok: [web01.example.com]

TASK [Show nginx service state] ***********************************************
ok: [web01.example.com] => {
    "msg": "Nginx is running"
}

TASK [Restart app only if it is currently active] *****************************
changed: [web01.example.com]   <-- myapp was running, restarted successfully

What just happened?

service_facts populated ansible_facts.services with a dictionary keyed by service unit names. The when condition checked both that the service key exists and that its state is running before restarting it — avoiding a failed restart attempt on a host where the service was not installed or was already stopped.

Rolling Service Restarts

The scenario: A team runs a fleet of 20 Nginx web servers behind a load balancer. A new config template has been updated and needs to be deployed without taking the entire fleet offline simultaneously. The requirement: deploy to one server at a time, verify it is healthy before moving to the next, and abort the rolling update if any server fails the health check.

---
- name: Rolling Nginx config deployment
  hosts: webservers
  become: true
  serial: 1                    # process ONE host at a time — rolling update
  max_fail_percentage: 0       # abort the entire play if ANY host fails

  tasks:
    - name: Deploy updated Nginx configuration
      ansible.builtin.template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
        validate: "nginx -t -c %s"   # test config before replacing live file
        backup: true
      notify: Reload Nginx

    - name: Force handler to run now (not at end of play)
      ansible.builtin.meta: flush_handlers

    - name: Wait for Nginx to be healthy on port 80
      ansible.builtin.uri:
        url: "http://{{ ansible_default_ipv4.address }}/health"
        status_code: 200
        timeout: 10
      register: health_check
      retries: 5
      delay: 3
      until: health_check.status == 200

  handlers:
    - name: Reload Nginx
      ansible.builtin.service:
        name: nginx
        state: reloaded          # zero-downtime reload — not restart
# Each host is processed one at a time — output shown for first two hosts

PLAY [Rolling Nginx config deployment] ****************************************

TASK [Deploy updated Nginx configuration] (web01) *****************************
changed: [web01.example.com]

RUNNING HANDLERS [Rolling Nginx config deployment] ****************************
changed: [web01.example.com]   <-- Nginx reloaded on web01

TASK [Wait for Nginx to be healthy on port 80] (web01) ************************
ok: [web01.example.com]        <-- health check passed

TASK [Deploy updated Nginx configuration] (web02) *****************************
changed: [web02.example.com]

RUNNING HANDLERS [Rolling Nginx config deployment] ****************************
changed: [web02.example.com]   <-- Nginx reloaded on web02

TASK [Wait for Nginx to be healthy on port 80] (web02) ************************
ok: [web02.example.com]        <-- health check passed
...continues for remaining 18 hosts...

What just happened?

Three mechanics worked together: serial: 1 processed one host at a time, keeping 19 servers live while the 20th was being updated. ansible.builtin.meta: flush_handlers forced the reload to happen immediately rather than at the end of the play — so the health check ran against the newly reloaded server. max_fail_percentage: 0 ensured the entire rolling update would abort if any single host failed its health check, preventing a broken config from propagating to the rest of the fleet.

Restart vs Reload — Decision Guide

Choosing between restarted and reloaded is one of the most consequential decisions in service management. Use this guide to pick the right one every time.

Use restarted when…

• The service binary itself was replaced (new version deployed)
• A change requires the process to re-initialise from scratch
• The service does not support graceful reload (SIGHUP)
• You need to clear in-memory state (caches, open file handles)
• A new unit file was deployed (after daemon_reload)

Use reloaded when…

• Only a configuration file changed (not the binary)
• The service explicitly supports graceful reload
• Zero downtime is required (Nginx, HAProxy, sshd, rsyslog)
• Existing connections must not be dropped
• You are in a rolling update where only one server reloads at a time

Always daemon_reload After Deploying or Modifying a Unit File

systemd caches unit file definitions in memory. If you deploy a new or modified .service file and then immediately try to start or enable the service without running daemon_reload: true, systemd will either use the old cached version or throw a "Unit file changed on disk" warning. Always run daemon_reload in a handler triggered by the unit file deployment task — and make sure the handler fires before any task that tries to start the service.

Key Takeaways

Use service for portability and systemd for systemd-specific featuresdaemon_reload, user scope, and masking are only available in the systemd module.
Prefer reloaded over restarted for config-only changes on services that support graceful reload — Nginx, HAProxy, sshd, and rsyslog all support it and it causes zero downtime.
Always daemon_reload after deploying a unit file — systemd will use a stale cached version otherwise. Run it in a handler triggered by the unit file deployment task, before the service start task.
Use serial with max_fail_percentage: 0 for rolling updates — process one server at a time and abort if any host fails, keeping the rest of the fleet healthy while you investigate.
Use service_facts to make service restarts conditional — checking that a service is actually running before restarting it avoids failed tasks on hosts where the service was not yet installed or was intentionally stopped.

Teacher's Note

Write a handler for Nginx that uses reloaded, run a playbook that changes the config, and watch Nginx pick up the new config without dropping a single request. Then change it to restarted and run it again while tailing the access log. The difference in behaviour will make the reload vs restart distinction permanent in your memory.

Practice Questions

1. After deploying a new systemd unit file, which ansible.builtin.systemd parameter must be set to true before the service can be started?



2. Which play-level attribute controls how many hosts are processed at a time — enabling rolling deployments that update one server while the rest remain live?



3. After running ansible.builtin.service_facts, under which variable namespace are the collected service states available?



Quiz

1. What is the key operational difference between state: reloaded and state: restarted?


2. In the rolling update playbook, why was ansible.builtin.meta: flush_handlers used after the deploy task?


3. You are deploying a service config change to 50 production servers and need to ensure no more than one server is restarted at a time, with an immediate abort if any restart fails. Which play attributes achieve this?


Up Next · Lesson 20

User and Permission Management

Learn to manage users, groups, SSH keys, and file permissions at scale — the foundation of secure, auditable access control across your entire fleet.