Linux Administration Lesson 15 – Process Management | Dataplexa
Section II — User, Process & Package Management

Process Management

In this lesson

ps and pgrep top and htop Signals and kill nice and renice Foreground and background jobs

Process management is the ability to inspect, control, and prioritise the programs running on a Linux system at any given moment. Every running program — from a web server to a shell command — is a process with a unique identifier, an owner, a resource footprint, and a state. Knowing how to find runaway processes, send them the right signal, and adjust their scheduling priority is an essential day-to-day skill for any Linux administrator.

How Linux Represents Processes

Every process on Linux is assigned a Process ID (PID) — a unique integer used to reference it in all management commands. Processes form a tree: every process (except PID 1 — systemd) was created by a parent process, which it inherits environment variables, open file descriptors, and signal handling from. Understanding this parent-child relationship helps explain why killing a parent process often orphans or kills its children.

systemd PID 1 · PPID 0 sshd PID 892 · PPID 1 nginx (master) PID 1235 · PPID 1 cron PID 1102 · PPID 1 bash (alice) PID 4821 · PPID 892 nginx (worker) PID 1236 · PPID 1235 nginx (worker) PID 1237 · PPID 1235 vim PID 5100 · PPID 4821 PPID = Parent Process ID · Every process inherits environment from its parent

Fig 1 — The Linux process tree: every process has a PID and a PPID linking it to its parent

Process States
Code State Meaning
R Running Actively executing on a CPU, or in the run queue ready to execute.
S Sleeping Waiting for an event (I/O, timer, signal). Normal for most idle daemons.
D Uninterruptible sleep Waiting on hardware I/O — cannot be killed. Many D state processes indicate an I/O bottleneck or hung disk.
Z Zombie Process has exited but its entry remains until the parent reads the exit status. Small numbers are normal — large numbers indicate a bug in the parent.
T Stopped Paused — either by a SIGSTOP signal or by the user pressing Ctrl+Z.

Viewing Processes with ps and pgrep

ps (process status) takes a snapshot of running processes at the moment it is called. It has a notoriously complex set of flags inherited from both BSD and SysV traditions — but in practice, two invocations cover nearly all use cases.

# The universal snapshot — every process, full detail, with hierarchy
ps aux

# BSD-style: every process with user, CPU, memory, PID, command
# a = all users, u = user-oriented format, x = include processes without a terminal
ps aux | head -20

# SysV-style: every process with parent PID, useful for tree relationships
ps -ef

# Show only processes owned by a specific user
ps -u alice

# Show a specific process by PID
ps -p 1235

# Show a process tree — visualise parent-child relationships
ps auxf

# Filter with grep — find nginx processes (excluding the grep process itself)
ps aux | grep '[n]ginx'

What just happened? The grep pattern '[n]ginx' uses a character class trick — it matches nginx but not grep [n]ginx itself, so the grep process does not appear in the output. The Ss state on the master means it is sleeping but is also a session leader — the workers show S, normally sleeping while waiting for connections.

# pgrep — find PIDs by process name (cleaner than ps | grep)
pgrep nginx

# pgrep with full process name match and list format
pgrep -la nginx

# pgrep filtered by user
pgrep -u alice

# pgrep with parent PID — find all children of a given process
pgrep -P 1235

# pidof — find the PID of a named program (simpler than pgrep for exact names)
pidof nginx
pidof sshd

What just happened? pgrep -P 1235 returned only the PIDs of nginx's worker children. pidof sshd returned three PIDs — the main sshd listener plus two active SSH sessions. Both commands output clean PID lists suitable for piping directly into kill or other process management commands.

Real-Time Monitoring with top and htop

While ps gives a snapshot, top provides a continuously refreshing view of system-wide resource consumption. It shows which processes are consuming the most CPU and memory in real time, making it the first tool to reach for when a system feels slow or a runaway process is suspected.

top — built-in, always available

top - 11:42:07 up 2:26, 2 users, load: 0.12
Tasks: 142 total, 1 running, 141 sleeping
%Cpu(s): 2.1 us, 0.5 sy, 0.0 ni, 97.1 id
MiB Mem: 3934.8 total, 412.1 free
PID USER %CPU %MEM COMMAND
1235 nginx 0.3 0.1 nginx
892 sshd 0.0 0.0 sshd
1102 root 0.0 0.0 cron

Key shortcuts: M sort by mem · P sort by CPU · k kill PID · q quit · 1 per-CPU view

htop — enhanced, install separately

CPU[||| 12%] Mem[||||||| 1.2G]
CPU[| 3%] Swp[ 0K]
PID USER CPU% MEM% COMMAND
1235 nginx 0.3 0.1 nginx: master
892 sshd 0.0 0.0 sshd: alice
1102 root 0.0 0.0 /usr/sbin/cron

Advantages: mouse support · colour bars · tree view built-in · F9 send signal · F6 sort column · scroll freely

# Launch top (press q to quit)
top

# Launch top sorted by memory usage immediately
top -o %MEM

# Launch top showing only a specific user's processes
top -u alice

# Install htop (not installed by default on all distros)
sudo apt install htop -y        # Debian/Ubuntu
sudo dnf install htop -y        # RHEL/Rocky

# Launch htop
htop

# Non-interactive: run top once and output to stdout — useful in scripts
top -b -n 1 | head -20

Analogy: ps is like taking a photograph of a crowd — you see everyone frozen at that instant. top is like watching a live video feed — you see who is moving, who is consuming energy, and who suddenly starts running.

Signals and kill — Communicating with Processes

In Linux, you communicate with a running process by sending it a signal — a numbered notification that triggers a specific behaviour. The kill command sends signals by PID, despite its name being misleading — most signals are not about termination at all.

SIGHUP (1)
Hangup — reload configuration

Originally meant "terminal disconnected". Most daemons implement it as a graceful config reload. Equivalent to systemctl reload for many services.

SIGINT (2)
Interrupt — polite stop request

What Ctrl+C sends. The process can catch this signal and run cleanup code before exiting.

SIGTERM (15)
Terminate — graceful shutdown request

The default signal sent by kill when no signal is specified. The process can catch it and shut down cleanly. Always try this first.

SIGKILL (9)
Kill — immediate, uncatchable termination

Handled by the kernel directly — the process cannot catch, block, or ignore it. No cleanup is possible. Use only after SIGTERM has failed, and understand that open files may be left in an inconsistent state.

SIGSTOP (19)
Stop — pause execution

Suspends the process without terminating it, like pressing pause. Resume with SIGCONT (18). Also uncatchable — always works.

# List all available signals
kill -l

# Send SIGTERM (graceful termination) to a PID — try this first
kill 5100
kill -15 5100       # same thing, explicit signal number
kill -SIGTERM 5100  # same thing, explicit signal name

# Send SIGKILL only if SIGTERM has not worked after a few seconds
kill -9 5100
kill -SIGKILL 5100

# Kill by process name instead of PID — sends SIGTERM to all matching processes
pkill nginx

# Kill all processes owned by a user
pkill -u baduser

# Send SIGHUP to reload config without restarting
kill -HUP 1235
kill -1 1235        # same thing

# Kill a process and all its children (process group)
kill -TERM -1235    # negative PID targets the entire process group

What just happened? Both kill commands produced no output on success — silence is the expected result. SIGTERM gave the process a chance to clean up (close files, flush buffers, release locks) before exiting. SIGKILL bypassed all of that — the kernel simply removed the process immediately, which is why it is the last resort rather than the first choice.

Process Priority with nice and renice

Linux uses a niceness value ranging from -20 (highest CPU priority) to +19 (lowest CPU priority) to influence how the scheduler allocates CPU time between competing processes. A process with a lower nice value gets more CPU time when the system is under load. The name "nice" reflects the idea that a high-nice process is being "nice" to other processes by yielding CPU time.

-20 -10 0 +10 +19 highest priority default lowest priority root only — more CPU any user — less CPU

Fig 2 — The niceness scale: lower value = higher CPU priority. Only root can set negative values.

# Start a command with a specific nice value (lower priority for a background job)
nice -n 10 tar -czf /backup/archive.tar.gz /var/data/

# Start a high-priority process (root only — negative nice values)
sudo nice -n -5 /opt/critical-service/bin/server

# Change the nice value of an already-running process
sudo renice -n 15 -p 5100         # lower priority of PID 5100
sudo renice -n -5 -p 1235         # raise priority (root only)

# Change the nice value of all processes owned by a user
sudo renice -n 10 -u batch_user

# View current nice values in ps output (NI column)
ps -eo pid,user,ni,comm --sort=-ni | head -15

# View nice values in top — the NI column
# Press r in top to renice interactively

What just happened? The ps output showed systemd at -20 — the highest possible priority — which is correct since PID 1 must always be schedulable. The updatedb process (which indexes the filesystem) is intentionally set to nice 19 by default so it does not noticeably slow down interactive work when it runs.

Foreground, Background, and Job Control

When you run a command in a terminal, it occupies the foreground — blocking the prompt until it finishes. Linux allows you to move long-running processes to the background, freeing the terminal for other work, or to detach them entirely so they survive when the terminal closes.

Ctrl+Z

Suspend the foreground process — it is paused (SIGSTOP) and moved to the background as a stopped job. The terminal prompt returns.

bg %1

Resume a stopped job in the background — it continues running but no longer blocks the terminal.

fg %1

Bring a background job back to the foreground — it regains the terminal and blocks the prompt again.

cmd &

Appending & starts a command directly in the background without needing Ctrl+Z first.

nohup

Runs a command immune to SIGHUP — the process continues after the terminal closes or the SSH session disconnects.

# Start a long job in the background immediately
tar -czf /backup/full.tar.gz /var/data/ &

# List all current jobs in this shell
jobs

# Suspend a running foreground process, then resume it in the background
# (while a command is running, press Ctrl+Z)
# [1]+  Stopped   tar -czf /backup/full.tar.gz /var/data/
bg %1

# Bring a background job back to the foreground
fg %1

# Run a command that survives terminal disconnect
nohup long-running-script.sh > /tmp/output.log 2>&1 &

# Disown a running background job — detaches it from the shell entirely
long-running-script.sh &
disown %1

What just happened? Running tar & returned the job number [1] and its PID 5301 immediately, giving the shell back. nohup explicitly redirected both stdout and stderr to a log file because once the terminal closes, there is nowhere else for the output to go — without this redirection, output would be silently discarded.

Never Use kill -9 as Your First Response

SIGKILL gives the process no opportunity to flush buffers, close database connections, release file locks, or write a clean shutdown state. On a database process this can cause data corruption; on a service holding a lock file it can leave the lock in place preventing restart. Always send SIGTERM first and wait 5–10 seconds. Only escalate to SIGKILL if the process genuinely does not respond.

Lesson Checklist

I can read the process state codes in ps output — R, S, D, Z, T — and know which states indicate a potential problem
I use ps aux and pgrep -la to find processes, and top to identify resource hogs in real time
I always send SIGTERM first and wait before escalating to SIGKILL, and I know the difference between kill (by PID) and pkill (by name)
I can use nice to start a low-priority job and renice to adjust the priority of a running process
I can move processes between foreground and background using Ctrl+Z, bg, fg, and nohup, and know when to use each

Teacher's Note

The grep trick grep '[n]ginx' comes up in nearly every shell scripting interview and real-world script. Learn it once: wrapping the first character in square brackets creates a character class that matches the same string but does not match the grep command itself — removing the need for grep -v grep.

Practice Questions

1. A user reports the system feels sluggish. Describe the sequence of commands you would run to identify which process is consuming the most CPU, find its PID and owner, and then gracefully terminate it — including what you would do if the graceful termination does not work within 10 seconds.

2. You need to run a large database backup script (backup.sh) over SSH. The script takes several hours. Write the command to start it so that it continues running even if your SSH session disconnects, and explain what each part of the command does.

3. Explain what it means when ps aux shows several processes in state D. Why can these processes not be killed with kill -9, and what does a high number of D state processes typically indicate about the system?

Lesson Quiz

1. What is the key difference between SIGTERM and SIGKILL?

2. You start a CPU-intensive compression job with nice -n 15 gzip largefile.log. What does the niceness value of 15 mean in practice?

3. You press Ctrl+Z while a command is running. What happens to the process, and what command would you run to resume it running in the background?

Up Next

Lesson 16 — Job Scheduling (cron, at)

Automating recurring and one-time tasks with crontab, at, and systemd timers