Linux Administration
Disk Usage and Cleanup
In this lesson
Disk usage and cleanup is the practice of understanding what is consuming storage on a Linux system, identifying waste and growth hotspots, and reclaiming space safely. A full disk is one of the most disruptive failures on a production server — it stops databases from writing, prevents logs from rotating, and can corrupt running services. Proactive monitoring and disciplined cleanup habits prevent this class of incident entirely.
Checking Disk Space with df
df (disk free) reports the amount of space used and available on each mounted filesystem. It answers the question "how full is each partition?" — giving a system-wide view in seconds. It is always the first command to run when investigating a disk space issue.
# Human-readable sizes — the most common form
df -h
# Include filesystem type in the output
df -hT
# Show a specific path's filesystem usage
df -h /var/log
# Show inode usage instead of block usage (critical for inode exhaustion diagnosis)
df -i
# Exclude tmpfs and other virtual filesystems — show only real disks
df -hx tmpfs -x devtmpfs# df -hT Filesystem Type Size Used Avail Use% Mounted on /dev/sda3 ext4 98G 41G 52G 45% / /dev/sda2 ext4 974M 213M 694M 24% /boot /dev/sdb1 ext4 200G 187G 6.4G 97% /var/log tmpfs tmpfs 3.9G 0 3.9G 0% /run /dev/sdc1 xfs 500G 12G 488G 3% /data # df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda3 6553600 312400 6241200 5% / /dev/sdb1 1310720 1310719 1 100% /var/log
What just happened? Two critical situations appear in this output. First, /var/log is at 97% capacity — a log write failure is imminent. Second, the inode output shows /var/log at 100% inode usage with only 1 inode remaining. This means no new files can be created on that partition even if block space exists — both crises need immediate attention.
Measuring Directory Sizes with du
While df tells you how full a filesystem is, du (disk usage) tells you what is using it. By recursively summing the sizes of files and directories, du lets you drill down from filesystem to partition to subdirectory to individual files until you find the source of consumption.
Analogy: df is like reading the fuel gauge in your car — it tells you how much is left. du is like opening the boot and counting the luggage — it tells you what is taking up all the space.
# Show total size of a directory and all its contents
du -sh /var/log
# Show sizes of all immediate subdirectories — one level deep
du -h --max-depth=1 /var/log
# Sort directories by size — largest at the bottom
du -h --max-depth=1 /var | sort -h
# Sort directories by size — largest at the top (most useful for cleanup)
du -h --max-depth=1 /var | sort -rh | head -20
# Show size of every file in a directory recursively, sorted largest first
du -ah /var/log | sort -rh | head -20
# Summarise total usage for multiple paths
du -sh /var/log /var/cache /tmp /home# du -h --max-depth=1 /var | sort -rh | head -10 181G /var 168G /var/log 11G /var/cache 1.4G /var/lib 312M /var/backups 88M /var/spool 12M /var/tmp # du -ah /var/log | sort -rh | head -10 168G /var/log 94G /var/log/app.log 42G /var/log/app.log.1 18G /var/log/nginx 9G /var/log/nginx/access.log 5.4G /var/log/nginx/access.log.1 3.2G /var/log/journal
What just happened? The combination of du and sort -rh (reverse, human-readable sort) immediately exposed the culprit — a single app.log file consuming 94GB. This is the standard drill-down pattern: start at the full filesystem, descend one directory level at a time with --max-depth=1, sort by size, repeat until you reach the actual offending file.
Finding Large and Old Files with find
find is the most powerful tool for locating specific files by size, age, type, or ownership. Combined with -ls or -delete, it can both identify and clean up space consumers in a single pass.
# Find files larger than 1GB anywhere on the system
sudo find / -xdev -size +1G -ls 2>/dev/null
# Find files larger than 100MB in /var, sorted by size
sudo find /var -size +100M -printf "%s\t%p\n" 2>/dev/null | \
sort -rn | awk '{printf "%.1fM\t%s\n", $1/1048576, $2}' | head -20
# Find files not accessed in more than 90 days
find /var/log -atime +90 -type f -ls
# Find and delete log files older than 30 days (preview first with -ls)
find /var/log/archive -name "*.log" -mtime +30 -ls
find /var/log/archive -name "*.log" -mtime +30 -delete
# Find core dump files — often large and forgotten
find / -xdev -name "core" -o -name "core.[0-9]*" 2>/dev/null | xargs ls -lh
# Find files with no owner (orphaned — user deleted but file remains)
sudo find / -xdev -nouser -ls 2>/dev/null# sudo find /var -size +100M -printf "%s\t%p\n" | sort -rn | head -5 98566144000 /var/log/app.log 45097156608 /var/log/app.log.1 18874368000 /var/log/nginx/access.log 5368709120 /var/log/journal 1073741824 /var/cache/apt/archives/linux-image-6.5.0-1021.amd64.deb # find /var/log -atime +90 -type f -ls 12348 2048 -rw-r--r-- 1 syslog adm 2097152 Nov 12 2024 /var/log/syslog.4.gz 12349 1024 -rw-r--r-- 1 syslog adm 1048576 Oct 18 2024 /var/log/auth.log.4.gz
What just happened? The -xdev flag restricted the search to the current filesystem, preventing find from crossing mount boundaries and accidentally scanning network mounts or other slow volumes. The old compressed logs found with -atime +90 are prime cleanup candidates — they have not been read in three months, suggesting they are past any reasonable retention window.
Package Cache and System Cleanup
Linux package managers accumulate downloaded package files in a local cache. After installation these cached .deb and .rpm files are no longer needed but remain on disk indefinitely. On active servers this cache can grow to several gigabytes. Old kernels similarly accumulate and are a common source of /boot partition exhaustion.
Cached .deb files in /var/cache/apt/archives/. Safe to clear after packages are installed.
Previous kernel packages accumulate in /boot. Keep the current and one prior; remove the rest.
The systemd journal can grow without bound. Vacuuming it by time or size reclaims space immediately.
Build artefacts, session files, and crash dumps accumulate in /tmp and /var/tmp over time.
# ── Debian / Ubuntu ──────────────────────────────────────────────
# Show how much space the apt cache is using
du -sh /var/cache/apt/archives/
# Remove cached packages that can no longer be downloaded (obsolete versions)
sudo apt autoclean
# Remove ALL cached package files (safe — they can be re-downloaded)
sudo apt clean
# Remove packages that were auto-installed as dependencies but are no longer needed
sudo apt autoremove --purge
# List installed kernels to identify old ones
dpkg --list | grep linux-image
# Remove a specific old kernel (keep current + 1 previous)
sudo apt purge linux-image-6.5.0-18-generic
# ── RHEL / Rocky ─────────────────────────────────────────────────
# Show dnf cache size
du -sh /var/cache/dnf/
# Clean all dnf cache
sudo dnf clean all
# Remove old kernels — keep only the 2 most recent
sudo dnf remove --oldinstallonly --setopt installonly_limit=2 kernel# du -sh /var/cache/apt/archives/ 3.2G /var/cache/apt/archives/ # sudo apt autoclean Del linux-image-6.5.0-15-generic 6.5.0-15.15 [12.4 MB] Del linux-image-6.5.0-17-generic 6.5.0-17.17 [12.4 MB] # sudo apt autoremove --purge The following packages will be REMOVED: linux-image-6.5.0-15-generic* linux-headers-6.5.0-15* linux-image-6.5.0-17-generic* linux-headers-6.5.0-17* 0 upgraded, 0 newly installed, 4 to remove and 0 not upgraded. After this operation, 198 MB disk space will be freed.
What just happened? apt autoremove --purge identified four old kernel packages consuming 198MB in /boot. This is the most common cause of a full /boot partition — each kernel upgrade adds ~60MB but nothing removes the old ones automatically unless autoremove is run regularly.
Log Management and Journal Cleanup
Logs are the second most common cause of disk exhaustion after data files. Linux manages logs through two parallel systems: logrotate for traditional text log files under /var/log/, and journald for the systemd binary journal. Both need active size management on production systems.
| Directive | Effect |
|---|---|
daily / weekly / monthly |
How often to rotate the log file. |
rotate N |
Keep N rotated copies. Older files are deleted. rotate 7 keeps one week of daily logs. |
compress |
Gzip rotated log files. Combined with delaycompress to skip compressing the most recent rotated file. |
size N |
Rotate when the log reaches size N (e.g. 100M, 1G). Overrides time-based rotation. |
missingok |
Do not error if the log file is missing. |
postrotate / endscript |
Run a shell command after rotation — typically to signal the application to re-open its log file handle. |
# View logrotate configuration for a specific service
cat /etc/logrotate.d/nginx
# Force an immediate rotation (useful for testing or emergency cleanup)
sudo logrotate -f /etc/logrotate.d/nginx
# Run logrotate in debug mode — shows what would happen without making changes
sudo logrotate -d /etc/logrotate.conf
# ── systemd journal cleanup ───────────────────────────────────────
# Check current journal disk usage
journalctl --disk-usage
# Remove journal entries older than 2 weeks
sudo journalctl --vacuum-time=2weeks
# Limit journal to a maximum total size
sudo journalctl --vacuum-size=500M
# Set permanent journal size limit in journald config
sudo mkdir -p /etc/systemd/journald.conf.d/
sudo tee /etc/systemd/journald.conf.d/size-limit.conf <<'EOF'
[Journal]
SystemMaxUse=500M
SystemKeepFree=1G
MaxRetentionSec=2weeks
EOF
sudo systemctl restart systemd-journald# journalctl --disk-usage Archived and active journals take up 4.2G in the file system. # sudo journalctl --vacuum-time=2weeks Vacuuming done, freed 3.1G of archived journals from /var/log/journal. # sudo journalctl --vacuum-size=500M Vacuuming done, freed 204.5M of archived journals from /var/log/journal.
What just happened? The journal had accumulated 4.2GB — mostly older archived log files. --vacuum-time=2weeks deleted everything older than 14 days, freeing 3.1GB immediately. The permanent configuration in /etc/systemd/journald.conf.d/ will prevent this from happening again — the journal will now self-limit to 500MB and automatically discard old entries.
Inode Exhaustion and Deleted-but-Open Files
Two subtle disk-space problems trip up even experienced administrators. Inode exhaustion means a filesystem is out of file-tracking slots — no new files can be created even though block space remains. Deleted-but-open files means disk space is not freed after rm because a running process still holds the file open — the space is only released when that process closes or is restarted.
Inode exhaustion
Symptoms: No space left on device but df -h shows free space. Diagnosis: df -i shows 100% inode usage. Cause: millions of tiny files (mail queues, session caches, tmp files).
df -i
# Find dir with most files:
find /var -xdev -printf '%h\n' | \
sort | uniq -c | sort -rn | head
Deleted-but-open files
Symptoms: du shows less usage than df. Cause: a process opened a log file, then logrotate deleted it, but the process is still writing to the old open file descriptor — occupying space invisible to du.
# Find deleted files still held open:
sudo lsof +L1 | grep deleted
# Fix: restart the process holding the file
# ── Inode exhaustion diagnosis ────────────────────────────────────
# Check inode usage across all filesystems
df -i
# Find which directory contains the most files (the inode hog)
sudo find /var -xdev -printf '%h\n' 2>/dev/null | \
sort | uniq -c | sort -rn | head -10
# Count files in a specific directory non-recursively
ls /var/spool/exim4/input | wc -l
# ── Deleted-but-open files ────────────────────────────────────────
# List all deleted files currently held open by running processes
sudo lsof +L1
# Filter to just the large ones (column 7 is file size in bytes)
sudo lsof +L1 | awk 'NR>1 && $7 > 104857600 {print $7/1048576"MB", $1, $2, $9}'
# The fix: restart the process that holds the deleted file open
# Example: nginx is holding a deleted access.log
sudo systemctl restart nginx# sudo lsof +L1
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NAME
nginx 1235 root 10w REG 8,17 94489280512 0 /var/log/nginx/access.log (deleted)
postgres 3012 postgres 3w REG 8,17 42949672960 0 /var/log/pg/postgresql.log (deleted)
# sudo lsof +L1 | awk 'NR>1 && $7 > 104857600 {print $7/1048576"MB", $1, $2, $9}'
90112MB nginx 1235 /var/log/nginx/access.log (deleted)
40960MB postgres 3012 /var/log/pg/postgresql.log (deleted)
What just happened? lsof +L1 found two files marked (deleted) that together occupy over 130GB of space — but du would show zero for them because they no longer have a directory entry. This is the classic scenario where a disk appears "full" but du / accounts for far less than df reports. Restarting nginx and postgres will cause them to close their file descriptors and release all that space immediately.
Never Truncate a Log File with > — Restart the Process Instead
A common emergency response to a full disk is to run > /var/log/app.log to zero out a large log file. This truncates the inode on disk but the running process still holds the old file descriptor open — it continues writing to the same position in the original (now zeroed) file and the space is not freed. The correct approach is to either use logrotate -f to rotate and signal the process, or restart the service so it opens a fresh file handle. If you must truncate without a restart, use truncate -s 0 /var/log/app.log — this zeroes the file in place without changing the inode, so the open file descriptor continues to work correctly.
Lesson Checklist
df -hT for a system-wide space overview and du -h --max-depth=1 | sort -rh to drill down to the space consumer
find with -size, -mtime, and -xdev to locate large, old, and orphaned files on a specific filesystem
apt clean, apt autoremove, and journalctl --vacuum-time to prevent gradual storage accumulation
df -i when a disk appears full but df -h shows free space — inode exhaustion is the likely cause
lsof +L1 to find deleted-but-open files when du and df disagree, and I restart the holding process rather than truncating with >
Teacher's Note
The deleted-but-open file scenario (lsof +L1) is responsible for a surprising number of "disk full" incidents where the operator is completely baffled because du -sh / accounts for far less space than df -h reports. Memorise the pattern: df ≠ du? Run lsof +L1. It solves this class of problem every time.
Practice Questions
1. A production server alerts at 95% disk usage on /var. Describe the exact sequence of commands you would run to identify what is consuming the space, working from the filesystem level down to the individual file — including what to check if du and df disagree.
df -h to confirm the full filesystem. Then drill down: sudo du -h --max-depth=1 /var | sort -rh to find the largest subdirectory, then repeat with that directory until you reach the offending files. Use sudo find /var -type f -size +500M to surface large individual files. If df shows high usage but du totals are low, a process has the file open after deletion — find it with sudo lsof +L1 /var and restart that process to release the blocks.
2. An application server is throwing No space left on device errors when creating session files, but df -h shows 40% free space on the partition. What is the likely cause and what command would you run to confirm and diagnose it?
df -i which shows inode usage per filesystem. If IUse% is at or near 100%, the inode table is full. The fix is to find and remove directories containing huge numbers of small files — a common culprit is a mail queue, session store, or cache directory. Use sudo find /var -xdev -type d | xargs -I{} sh -c 'echo 0 {}' | sort -rn | head -20 to find the densest directories.
3. Write the command to permanently configure the systemd journal to keep no more than 300MB of logs and retain entries for a maximum of 10 days. Where does this configuration go, and what must you run for it to take effect?
/etc/systemd/journald.conf and set: SystemMaxUse=300M and MaxRetentionSec=10day under the [Journal] section. Then restart the journal service: sudo systemctl restart systemd-journald. To immediately vacuum logs down to the new limit without waiting: sudo journalctl --vacuum-size=300M --vacuum-time=10d.
Lesson Quiz
1. df -h shows 30% free space on /var, but creating a new file there fails with No space left on device. What is the most likely cause?
2. You delete a 50GB log file with rm /var/log/app.log but df -h shows no change in free space. What is the most likely explanation?
3. Which command shows the top disk-consuming subdirectories under /var, one level deep, sorted largest first?
Up Next
Lesson 21 — Log Files and Log Rotation
Reading system logs, configuring logrotate, and managing application logging across a Linux server