Linux Administration
Process Management
In this lesson
Process management is the ability to inspect, control, and prioritise the programs running on a Linux system at any given moment. Every running program — from a web server to a shell command — is a process with a unique identifier, an owner, a resource footprint, and a state. Knowing how to find runaway processes, send them the right signal, and adjust their scheduling priority is an essential day-to-day skill for any Linux administrator.
How Linux Represents Processes
Every process on Linux is assigned a Process ID (PID) — a unique integer used to reference it in all management commands. Processes form a tree: every process (except PID 1 — systemd) was created by a parent process, which it inherits environment variables, open file descriptors, and signal handling from. Understanding this parent-child relationship helps explain why killing a parent process often orphans or kills its children.
Fig 1 — The Linux process tree: every process has a PID and a PPID linking it to its parent
| Code | State | Meaning |
|---|---|---|
R |
Running | Actively executing on a CPU, or in the run queue ready to execute. |
S |
Sleeping | Waiting for an event (I/O, timer, signal). Normal for most idle daemons. |
D |
Uninterruptible sleep | Waiting on hardware I/O — cannot be killed. Many D state processes indicate an I/O bottleneck or hung disk. |
Z |
Zombie | Process has exited but its entry remains until the parent reads the exit status. Small numbers are normal — large numbers indicate a bug in the parent. |
T |
Stopped | Paused — either by a SIGSTOP signal or by the user pressing Ctrl+Z. |
Viewing Processes with ps and pgrep
ps (process status) takes a snapshot of running processes at the moment it is called. It has a notoriously complex set of flags inherited from both BSD and SysV traditions — but in practice, two invocations cover nearly all use cases.
# The universal snapshot — every process, full detail, with hierarchy
ps aux
# BSD-style: every process with user, CPU, memory, PID, command
# a = all users, u = user-oriented format, x = include processes without a terminal
ps aux | head -20
# SysV-style: every process with parent PID, useful for tree relationships
ps -ef
# Show only processes owned by a specific user
ps -u alice
# Show a specific process by PID
ps -p 1235
# Show a process tree — visualise parent-child relationships
ps auxf
# Filter with grep — find nginx processes (excluding the grep process itself)
ps aux | grep '[n]ginx'# ps aux | grep '[n]ginx' USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND www-data 1235 0.0 0.1 55360 4096 ? Ss 09:15 0:00 nginx: master process www-data 1236 0.0 0.2 55796 8192 ? S 09:15 0:02 nginx: worker process www-data 1237 0.0 0.2 55796 8192 ? S 09:15 0:01 nginx: worker process # ps auxf (tree view, truncated) root 1235 nginx: master process /usr/sbin/nginx www-data 1236 \_ nginx: worker process www-data 1237 \_ nginx: worker process
What just happened? The grep pattern '[n]ginx' uses a character class trick — it matches nginx but not grep [n]ginx itself, so the grep process does not appear in the output. The Ss state on the master means it is sleeping but is also a session leader — the workers show S, normally sleeping while waiting for connections.
# pgrep — find PIDs by process name (cleaner than ps | grep)
pgrep nginx
# pgrep with full process name match and list format
pgrep -la nginx
# pgrep filtered by user
pgrep -u alice
# pgrep with parent PID — find all children of a given process
pgrep -P 1235
# pidof — find the PID of a named program (simpler than pgrep for exact names)
pidof nginx
pidof sshd# pgrep -la nginx 1235 nginx: master process /usr/sbin/nginx -g daemon on; master_process on; 1236 nginx: worker process 1237 nginx: worker process # pgrep -P 1235 1236 1237 # pidof sshd 892 1044 1089
What just happened? pgrep -P 1235 returned only the PIDs of nginx's worker children. pidof sshd returned three PIDs — the main sshd listener plus two active SSH sessions. Both commands output clean PID lists suitable for piping directly into kill or other process management commands.
Real-Time Monitoring with top and htop
While ps gives a snapshot, top provides a continuously refreshing view of system-wide resource consumption. It shows which processes are consuming the most CPU and memory in real time, making it the first tool to reach for when a system feels slow or a runaway process is suspected.
top — built-in, always available
Key shortcuts: M sort by mem · P sort by CPU · k kill PID · q quit · 1 per-CPU view
htop — enhanced, install separately
Advantages: mouse support · colour bars · tree view built-in · F9 send signal · F6 sort column · scroll freely
# Launch top (press q to quit)
top
# Launch top sorted by memory usage immediately
top -o %MEM
# Launch top showing only a specific user's processes
top -u alice
# Install htop (not installed by default on all distros)
sudo apt install htop -y # Debian/Ubuntu
sudo dnf install htop -y # RHEL/Rocky
# Launch htop
htop
# Non-interactive: run top once and output to stdout — useful in scripts
top -b -n 1 | head -20
Analogy: ps is like taking a photograph of a crowd — you see everyone frozen at that instant. top is like watching a live video feed — you see who is moving, who is consuming energy, and who suddenly starts running.
Signals and kill — Communicating with Processes
In Linux, you communicate with a running process by sending it a signal — a numbered notification that triggers a specific behaviour. The kill command sends signals by PID, despite its name being misleading — most signals are not about termination at all.
Originally meant "terminal disconnected". Most daemons implement it as a graceful config reload. Equivalent to systemctl reload for many services.
What Ctrl+C sends. The process can catch this signal and run cleanup code before exiting.
The default signal sent by kill when no signal is specified. The process can catch it and shut down cleanly. Always try this first.
Handled by the kernel directly — the process cannot catch, block, or ignore it. No cleanup is possible. Use only after SIGTERM has failed, and understand that open files may be left in an inconsistent state.
Suspends the process without terminating it, like pressing pause. Resume with SIGCONT (18). Also uncatchable — always works.
# List all available signals
kill -l
# Send SIGTERM (graceful termination) to a PID — try this first
kill 5100
kill -15 5100 # same thing, explicit signal number
kill -SIGTERM 5100 # same thing, explicit signal name
# Send SIGKILL only if SIGTERM has not worked after a few seconds
kill -9 5100
kill -SIGKILL 5100
# Kill by process name instead of PID — sends SIGTERM to all matching processes
pkill nginx
# Kill all processes owned by a user
pkill -u baduser
# Send SIGHUP to reload config without restarting
kill -HUP 1235
kill -1 1235 # same thing
# Kill a process and all its children (process group)
kill -TERM -1235 # negative PID targets the entire process group# kill -l 1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP 6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1 11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM ... # kill 5100 # (no output — SIGTERM sent; process exits cleanly) # kill -9 5100 # (no output — process immediately terminated by kernel) # ps aux | grep vim # (empty — process is gone)
What just happened? Both kill commands produced no output on success — silence is the expected result. SIGTERM gave the process a chance to clean up (close files, flush buffers, release locks) before exiting. SIGKILL bypassed all of that — the kernel simply removed the process immediately, which is why it is the last resort rather than the first choice.
Process Priority with nice and renice
Linux uses a niceness value ranging from -20 (highest CPU priority) to +19 (lowest CPU priority) to influence how the scheduler allocates CPU time between competing processes. A process with a lower nice value gets more CPU time when the system is under load. The name "nice" reflects the idea that a high-nice process is being "nice" to other processes by yielding CPU time.
Fig 2 — The niceness scale: lower value = higher CPU priority. Only root can set negative values.
# Start a command with a specific nice value (lower priority for a background job)
nice -n 10 tar -czf /backup/archive.tar.gz /var/data/
# Start a high-priority process (root only — negative nice values)
sudo nice -n -5 /opt/critical-service/bin/server
# Change the nice value of an already-running process
sudo renice -n 15 -p 5100 # lower priority of PID 5100
sudo renice -n -5 -p 1235 # raise priority (root only)
# Change the nice value of all processes owned by a user
sudo renice -n 10 -u batch_user
# View current nice values in ps output (NI column)
ps -eo pid,user,ni,comm --sort=-ni | head -15
# View nice values in top — the NI column
# Press r in top to renice interactively# ps -eo pid,user,ni,comm --sort=-ni | head -10
PID USER NI COMMAND
5200 alice 19 updatedb
5100 alice 10 tar
1235 www-data 0 nginx
892 root 0 sshd
1102 root 0 cron
1 root -20 systemd
# sudo renice -n 15 -p 5100
5100 (process ID) old priority 10, new priority 15
What just happened? The ps output showed systemd at -20 — the highest possible priority — which is correct since PID 1 must always be schedulable. The updatedb process (which indexes the filesystem) is intentionally set to nice 19 by default so it does not noticeably slow down interactive work when it runs.
Foreground, Background, and Job Control
When you run a command in a terminal, it occupies the foreground — blocking the prompt until it finishes. Linux allows you to move long-running processes to the background, freeing the terminal for other work, or to detach them entirely so they survive when the terminal closes.
Suspend the foreground process — it is paused (SIGSTOP) and moved to the background as a stopped job. The terminal prompt returns.
Resume a stopped job in the background — it continues running but no longer blocks the terminal.
Bring a background job back to the foreground — it regains the terminal and blocks the prompt again.
Appending & starts a command directly in the background without needing Ctrl+Z first.
Runs a command immune to SIGHUP — the process continues after the terminal closes or the SSH session disconnects.
# Start a long job in the background immediately
tar -czf /backup/full.tar.gz /var/data/ &
# List all current jobs in this shell
jobs
# Suspend a running foreground process, then resume it in the background
# (while a command is running, press Ctrl+Z)
# [1]+ Stopped tar -czf /backup/full.tar.gz /var/data/
bg %1
# Bring a background job back to the foreground
fg %1
# Run a command that survives terminal disconnect
nohup long-running-script.sh > /tmp/output.log 2>&1 &
# Disown a running background job — detaches it from the shell entirely
long-running-script.sh &
disown %1# tar -czf /backup/full.tar.gz /var/data/ & [1] 5301 # jobs [1]+ Running tar -czf /backup/full.tar.gz /var/data/ # jobs (after completion) [1]+ Done tar -czf /backup/full.tar.gz /var/data/ # nohup long-running-script.sh > /tmp/output.log 2>&1 & [1] 5410 nohup: ignoring input and appending output to '/tmp/output.log'
What just happened? Running tar & returned the job number [1] and its PID 5301 immediately, giving the shell back. nohup explicitly redirected both stdout and stderr to a log file because once the terminal closes, there is nowhere else for the output to go — without this redirection, output would be silently discarded.
Never Use kill -9 as Your First Response
SIGKILL gives the process no opportunity to flush buffers, close database connections, release file locks, or write a clean shutdown state. On a database process this can cause data corruption; on a service holding a lock file it can leave the lock in place preventing restart. Always send SIGTERM first and wait 5–10 seconds. Only escalate to SIGKILL if the process genuinely does not respond.
Lesson Checklist
ps output — R, S, D, Z, T — and know which states indicate a potential problem
ps aux and pgrep -la to find processes, and top to identify resource hogs in real time
kill (by PID) and pkill (by name)
nice to start a low-priority job and renice to adjust the priority of a running process
bg, fg, and nohup, and know when to use each
Teacher's Note
The grep trick grep '[n]ginx' comes up in nearly every shell scripting interview and real-world script. Learn it once: wrapping the first character in square brackets creates a character class that matches the same string but does not match the grep command itself — removing the need for grep -v grep.
Practice Questions
1. A user reports the system feels sluggish. Describe the sequence of commands you would run to identify which process is consuming the most CPU, find its PID and owner, and then gracefully terminate it — including what you would do if the graceful termination does not work within 10 seconds.
top or ps aux --sort=-%cpu | head -10 to identify the top CPU consumer and its PID. Check the owner with ps -p <PID> -o user,pid,cmd. Send a graceful termination: kill <PID> (SIGTERM). Wait 10 seconds — if the process is still running, force kill it: kill -9 <PID> (SIGKILL). Confirm it is gone with ps -p <PID>.
2. You need to run a large database backup script (backup.sh) over SSH. The script takes several hours. Write the command to start it so that it continues running even if your SSH session disconnects, and explain what each part of the command does.
nohup ./backup.sh > backup.log 2>&1 & — nohup ignores the HUP signal sent when the SSH session closes; > backup.log redirects stdout to a log file; 2>&1 redirects stderr to the same file; & runs the process in the background immediately. Note the PID printed — use it to check progress with ps -p <PID>.
3. Explain what it means when ps aux shows several processes in state D. Why can these processes not be killed with kill -9, and what does a high number of D state processes typically indicate about the system?
D means uninterruptible sleep — the process is waiting on a kernel I/O operation (typically disk or NFS) and cannot be interrupted. SIGKILL cannot kill it because the process is executing kernel code, not user-space code, so the kernel never delivers the signal until the I/O completes. A high number of D-state processes usually indicates a storage problem — a slow or failing disk, a hung NFS mount, or I/O saturation.
Lesson Quiz
1. What is the key difference between SIGTERM and SIGKILL?
2. You start a CPU-intensive compression job with nice -n 15 gzip largefile.log. What does the niceness value of 15 mean in practice?
3. You press Ctrl+Z while a command is running. What happens to the process, and what command would you run to resume it running in the background?
Up Next
Lesson 16 — Job Scheduling (cron, at)
Automating recurring and one-time tasks with crontab, at, and systemd timers