Linux Administration Lesson 15 – Process Management | Dataplexa

Section II — User, Process & Package Management

Process Management

In this lesson

ps and pgrep top and htop Signals and kill nice and renice Foreground and background jobs

Process management is the ability to inspect, control, and prioritise the programs running on a Linux system at any given moment. Every running program — from a web server to a shell command — is a process with a unique identifier, an owner, a resource footprint, and a state. Knowing how to find runaway processes, send them the right signal, and adjust their scheduling priority is an essential day-to-day skill for any Linux administrator.

How Linux Represents Processes

Every process on Linux is assigned a Process ID (PID) — a unique integer used to reference it in all management commands. Processes form a tree: every process (except PID 1 — systemd) was created by a parent process, which it inherits environment variables, open file descriptors, and signal handling from. Understanding this parent-child relationship helps explain why killing a parent process often orphans or kills its children.

Fig 1 — The Linux process tree: every process has a PID and a PPID linking it to its parent

Process States

Code	State	Meaning
`R`	Running	Actively executing on a CPU, or in the run queue ready to execute.
`S`	Sleeping	Waiting for an event (I/O, timer, signal). Normal for most idle daemons.
`D`	Uninterruptible sleep	Waiting on hardware I/O — cannot be killed. Many `D` state processes indicate an I/O bottleneck or hung disk.
`Z`	Zombie	Process has exited but its entry remains until the parent reads the exit status. Small numbers are normal — large numbers indicate a bug in the parent.
`T`	Stopped	Paused — either by a `SIGSTOP` signal or by the user pressing `Ctrl+Z`.

Viewing Processes with ps and pgrep

ps (process status) takes a snapshot of running processes at the moment it is called. It has a notoriously complex set of flags inherited from both BSD and SysV traditions — but in practice, two invocations cover nearly all use cases.

# The universal snapshot — every process, full detail, with hierarchy
ps aux

# BSD-style: every process with user, CPU, memory, PID, command
# a = all users, u = user-oriented format, x = include processes without a terminal
ps aux | head -20

# SysV-style: every process with parent PID, useful for tree relationships
ps -ef

# Show only processes owned by a specific user
ps -u alice

# Show a specific process by PID
ps -p 1235

# Show a process tree — visualise parent-child relationships
ps auxf

# Filter with grep — find nginx processes (excluding the grep process itself)
ps aux | grep '[n]ginx'

# ps aux | grep '[n]ginx'
USER       PID  %CPU %MEM    VSZ   RSS TTY   STAT START   TIME COMMAND
www-data  1235   0.0  0.1  55360  4096 ?     Ss   09:15   0:00 nginx: master process
www-data  1236   0.0  0.2  55796  8192 ?     S    09:15   0:02 nginx: worker process
www-data  1237   0.0  0.2  55796  8192 ?     S    09:15   0:01 nginx: worker process

# ps auxf (tree view, truncated)
root      1235  nginx: master process /usr/sbin/nginx
www-data  1236   \_ nginx: worker process
www-data  1237   \_ nginx: worker process

What just happened? The grep pattern '[n]ginx' uses a character class trick — it matches nginx but not grep [n]ginx itself, so the grep process does not appear in the output. The Ss state on the master means it is sleeping but is also a session leader — the workers show S, normally sleeping while waiting for connections.

# pgrep — find PIDs by process name (cleaner than ps | grep)
pgrep nginx

# pgrep with full process name match and list format
pgrep -la nginx

# pgrep filtered by user
pgrep -u alice

# pgrep with parent PID — find all children of a given process
pgrep -P 1235

# pidof — find the PID of a named program (simpler than pgrep for exact names)
pidof nginx
pidof sshd

# pgrep -la nginx
1235 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
1236 nginx: worker process
1237 nginx: worker process

# pgrep -P 1235
1236
1237

# pidof sshd
892 1044 1089

What just happened? pgrep -P 1235 returned only the PIDs of nginx's worker children. pidof sshd returned three PIDs — the main sshd listener plus two active SSH sessions. Both commands output clean PID lists suitable for piping directly into kill or other process management commands.

Real-Time Monitoring with top and htop

While ps gives a snapshot, top provides a continuously refreshing view of system-wide resource consumption. It shows which processes are consuming the most CPU and memory in real time, making it the first tool to reach for when a system feels slow or a runaway process is suspected.

top — built-in, always available
top - 11:42:07 up 2:26, 2 users, load: 0.12
Tasks: 142 total, 1 running, 141 sleeping
%Cpu(s): 2.1 us, 0.5 sy, 0.0 ni, 97.1 id
MiB Mem:  3934.8 total,  412.1 free
PID  USER   %CPU %MEM  COMMAND
1235 nginx   0.3  0.1  nginx
892  sshd    0.0  0.0  sshd
1102 root    0.0  0.0  cron
Key shortcuts: M sort by mem · P sort by CPU · k kill PID · q quit · 1 per-CPU view

htop — enhanced, install separately

          CPU[|||         12%]
          Mem[|||||||    1.2G]
        
CPU[|           3%]   Swp[          0K]
PID  USER   CPU%  MEM%  COMMAND
1235 nginx   0.3   0.1  nginx: master
 892 sshd    0.0   0.0  sshd: alice
1102 root    0.0   0.0  /usr/sbin/cron
Advantages: mouse support · colour bars · tree view built-in · F9 send signal · F6 sort column · scroll freely

# Launch top (press q to quit)
top

# Launch top sorted by memory usage immediately
top -o %MEM

# Launch top showing only a specific user's processes
top -u alice

# Install htop (not installed by default on all distros)
sudo apt install htop -y        # Debian/Ubuntu
sudo dnf install htop -y        # RHEL/Rocky

# Launch htop
htop

# Non-interactive: run top once and output to stdout — useful in scripts
top -b -n 1 | head -20

Analogy: ps is like taking a photograph of a crowd — you see everyone frozen at that instant. top is like watching a live video feed — you see who is moving, who is consuming energy, and who suddenly starts running.

Signals and kill — Communicating with Processes

In Linux, you communicate with a running process by sending it a signal — a numbered notification that triggers a specific behaviour. The kill command sends signals by PID, despite its name being misleading — most signals are not about termination at all.

SIGHUP (1)

Hangup — reload configuration

Originally meant "terminal disconnected". Most daemons implement it as a graceful config reload. Equivalent to systemctl reload for many services.

SIGINT (2)

Interrupt — polite stop request

What Ctrl+C sends. The process can catch this signal and run cleanup code before exiting.

SIGTERM (15)

Terminate — graceful shutdown request

The default signal sent by kill when no signal is specified. The process can catch it and shut down cleanly. Always try this first.

SIGKILL (9)

Kill — immediate, uncatchable termination

Handled by the kernel directly — the process cannot catch, block, or ignore it. No cleanup is possible. Use only after SIGTERM has failed, and understand that open files may be left in an inconsistent state.

SIGSTOP (19)

Stop — pause execution

Suspends the process without terminating it, like pressing pause. Resume with SIGCONT (18). Also uncatchable — always works.

# List all available signals
kill -l

# Send SIGTERM (graceful termination) to a PID — try this first
kill 5100
kill -15 5100       # same thing, explicit signal number
kill -SIGTERM 5100  # same thing, explicit signal name

# Send SIGKILL only if SIGTERM has not worked after a few seconds
kill -9 5100
kill -SIGKILL 5100

# Kill by process name instead of PID — sends SIGTERM to all matching processes
pkill nginx

# Kill all processes owned by a user
pkill -u baduser

# Send SIGHUP to reload config without restarting
kill -HUP 1235
kill -1 1235        # same thing

# Kill a process and all its children (process group)
kill -TERM -1235    # negative PID targets the entire process group

# kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL
 5) SIGTRAP      6) SIGABRT      7) SIGBUS       8) SIGFPE
 9) SIGKILL     10) SIGUSR1     11) SIGSEGV     12) SIGUSR2
13) SIGPIPE     14) SIGALRM     15) SIGTERM     ...

# kill 5100
# (no output — SIGTERM sent; process exits cleanly)

# kill -9 5100
# (no output — process immediately terminated by kernel)

# ps aux | grep vim
# (empty — process is gone)

What just happened? Both kill commands produced no output on success — silence is the expected result. SIGTERM gave the process a chance to clean up (close files, flush buffers, release locks) before exiting. SIGKILL bypassed all of that — the kernel simply removed the process immediately, which is why it is the last resort rather than the first choice.

Process Priority with nice and renice

Linux uses a niceness value ranging from -20 (highest CPU priority) to +19 (lowest CPU priority) to influence how the scheduler allocates CPU time between competing processes. A process with a lower nice value gets more CPU time when the system is under load. The name "nice" reflects the idea that a high-nice process is being "nice" to other processes by yielding CPU time.

Fig 2 — The niceness scale: lower value = higher CPU priority. Only root can set negative values.

# Start a command with a specific nice value (lower priority for a background job)
nice -n 10 tar -czf /backup/archive.tar.gz /var/data/

# Start a high-priority process (root only — negative nice values)
sudo nice -n -5 /opt/critical-service/bin/server

# Change the nice value of an already-running process
sudo renice -n 15 -p 5100         # lower priority of PID 5100
sudo renice -n -5 -p 1235         # raise priority (root only)

# Change the nice value of all processes owned by a user
sudo renice -n 10 -u batch_user

# View current nice values in ps output (NI column)
ps -eo pid,user,ni,comm --sort=-ni | head -15

# View nice values in top — the NI column
# Press r in top to renice interactively

# ps -eo pid,user,ni,comm --sort=-ni | head -10
  PID USER       NI COMMAND
 5200 alice      19 updatedb
 5100 alice      10 tar
 1235 www-data    0 nginx
  892 root        0 sshd
 1102 root        0 cron
    1 root      -20 systemd

# sudo renice -n 15 -p 5100
5100 (process ID) old priority 10, new priority 15

What just happened? The ps output showed systemd at -20 — the highest possible priority — which is correct since PID 1 must always be schedulable. The updatedb process (which indexes the filesystem) is intentionally set to nice 19 by default so it does not noticeably slow down interactive work when it runs.

Foreground, Background, and Job Control

When you run a command in a terminal, it occupies the foreground — blocking the prompt until it finishes. Linux allows you to move long-running processes to the background, freeing the terminal for other work, or to detach them entirely so they survive when the terminal closes.

Ctrl+Z

Suspend the foreground process — it is paused (SIGSTOP) and moved to the background as a stopped job. The terminal prompt returns.

bg %1

Resume a stopped job in the background — it continues running but no longer blocks the terminal.

fg %1

Bring a background job back to the foreground — it regains the terminal and blocks the prompt again.

cmd &

Appending & starts a command directly in the background without needing Ctrl+Z first.

nohup

Runs a command immune to SIGHUP — the process continues after the terminal closes or the SSH session disconnects.

# Start a long job in the background immediately
tar -czf /backup/full.tar.gz /var/data/ &

# List all current jobs in this shell
jobs

# Suspend a running foreground process, then resume it in the background
# (while a command is running, press Ctrl+Z)
# [1]+  Stopped   tar -czf /backup/full.tar.gz /var/data/
bg %1

# Bring a background job back to the foreground
fg %1

# Run a command that survives terminal disconnect
nohup long-running-script.sh > /tmp/output.log 2>&1 &

# Disown a running background job — detaches it from the shell entirely
long-running-script.sh &
disown %1

# tar -czf /backup/full.tar.gz /var/data/ &
[1] 5301

# jobs
[1]+  Running    tar -czf /backup/full.tar.gz /var/data/

# jobs (after completion)
[1]+  Done       tar -czf /backup/full.tar.gz /var/data/

# nohup long-running-script.sh > /tmp/output.log 2>&1 &
[1] 5410
nohup: ignoring input and appending output to '/tmp/output.log'

What just happened? Running tar & returned the job number [1] and its PID 5301 immediately, giving the shell back. nohup explicitly redirected both stdout and stderr to a log file because once the terminal closes, there is nowhere else for the output to go — without this redirection, output would be silently discarded.

Never Use kill -9 as Your First Response

SIGKILL gives the process no opportunity to flush buffers, close database connections, release file locks, or write a clean shutdown state. On a database process this can cause data corruption; on a service holding a lock file it can leave the lock in place preventing restart. Always send SIGTERM first and wait 5–10 seconds. Only escalate to SIGKILL if the process genuinely does not respond.

Lesson Checklist

✔ I can read the process state codes in ps output — R, S, D, Z, T — and know which states indicate a potential problem

✔ I use ps aux and pgrep -la to find processes, and top to identify resource hogs in real time

✔ I always send SIGTERM first and wait before escalating to SIGKILL, and I know the difference between kill (by PID) and pkill (by name)

✔ I can use nice to start a low-priority job and renice to adjust the priority of a running process

✔ I can move processes between foreground and background using Ctrl+Z, bg, fg, and nohup, and know when to use each

Teacher's Note

The grep trick grep '[n]ginx' comes up in nearly every shell scripting interview and real-world script. Learn it once: wrapping the first character in square brackets creates a character class that matches the same string but does not match the grep command itself — removing the need for grep -v grep.

Practice Questions

1. A user reports the system feels sluggish. Describe the sequence of commands you would run to identify which process is consuming the most CPU, find its PID and owner, and then gracefully terminate it — including what you would do if the graceful termination does not work within 10 seconds.

Run top or ps aux --sort=-%cpu | head -10 to identify the top CPU consumer and its PID. Check the owner with ps -p <PID> -o user,pid,cmd. Send a graceful termination: kill <PID> (SIGTERM). Wait 10 seconds — if the process is still running, force kill it: kill -9 <PID> (SIGKILL). Confirm it is gone with ps -p <PID>.

2. You need to run a large database backup script (backup.sh) over SSH. The script takes several hours. Write the command to start it so that it continues running even if your SSH session disconnects, and explain what each part of the command does.

nohup ./backup.sh > backup.log 2>&1 & — nohup ignores the HUP signal sent when the SSH session closes; > backup.log redirects stdout to a log file; 2>&1 redirects stderr to the same file; & runs the process in the background immediately. Note the PID printed — use it to check progress with ps -p <PID>.

3. Explain what it means when ps aux shows several processes in state D. Why can these processes not be killed with kill -9, and what does a high number of D state processes typically indicate about the system?

State D means uninterruptible sleep — the process is waiting on a kernel I/O operation (typically disk or NFS) and cannot be interrupted. SIGKILL cannot kill it because the process is executing kernel code, not user-space code, so the kernel never delivers the signal until the I/O completes. A high number of D-state processes usually indicates a storage problem — a slow or failing disk, a hung NFS mount, or I/O saturation.

Lesson Quiz

1. What is the key difference between SIGTERM and SIGKILL?

SIGTERM immediately removes the process; SIGKILL allows the process to clean up first SIGTERM can be caught by the process allowing graceful shutdown; SIGKILL is handled by the kernel and cannot be caught or ignored SIGTERM only works on root-owned processes; SIGKILL works on any process They are identical — SIGKILL is just the numeric form of SIGTERM

2. You start a CPU-intensive compression job with nice -n 15 gzip largefile.log. What does the niceness value of 15 mean in practice?

The job will run 15 times faster than normal processes The job has a lower CPU scheduling priority than default (0), so other processes will be favoured when competing for CPU time The job has a higher CPU scheduling priority and will preempt other running processes The job will be paused for 15 seconds between each CPU burst to prevent overheating

3. You press Ctrl+Z while a command is running. What happens to the process, and what command would you run to resume it running in the background?

The process is terminated; run restart %1 to run it again in the background The process is suspended (SIGSTOP); run bg %1 to resume it in the background The process is moved to the background automatically; run fg %1 to confirm it is running The process continues running in the background; run nohup %1 to detach it from the terminal

Up Next

Lesson 16 — Job Scheduling (cron, at)

Automating recurring and one-time tasks with crontab, at, and systemd timers

← Previous Course Index Next →