Linux Administration
Log Files and Log Rotation
In this lesson
Log files are the written record of everything that happens on a Linux system — service starts and failures, authentication attempts, kernel messages, application errors, and security events. Reading logs is the primary skill in diagnosing any problem on a Linux server. Managing their growth through rotation, compression, and retention policies is what keeps that record useful without consuming all available disk space.
The Linux Log Landscape
Linux log files live under /var/log/. Each file or subdirectory serves a specific purpose. Knowing which file to open first for a given class of problem is the difference between a five-minute diagnosis and an hour of searching.
Fig 1 — Key log files under /var/log/ and what each records
| File / Path | What it records |
|---|---|
/var/log/syslog (Debian)/var/log/messages (RHEL) |
General system messages — service events, kernel notices, most daemon output. First log to check for any unexplained system behaviour. |
/var/log/auth.log (Debian)/var/log/secure (RHEL) |
All authentication events — successful and failed logins, sudo usage, SSH sessions, PAM events. Essential for security auditing. |
/var/log/kern.log |
Kernel messages — hardware detection, driver errors, OOM killer events, filesystem errors. Useful for hardware and stability issues. |
/var/log/dpkg.log |
Complete package install/remove history with timestamps. Useful for answering "what changed on this server last Tuesday?" |
/var/log/nginx//var/log/apache2/ |
access.log records every HTTP request; error.log records failed requests, upstream errors, and config problems. |
/var/log/journal/ |
systemd binary journal — structured, indexed, queryable with journalctl. Contains everything from all services plus kernel messages. |
Reading Logs Effectively
Raw log files are large. Effective log reading means filtering to the relevant time window, severity level, and pattern — rather than reading line by line. The combination of tail, grep, and journalctl covers nearly every real-world log reading scenario.
# Watch a log file in real time as new lines are appended
tail -f /var/log/syslog
tail -f /var/log/nginx/access.log
# Show the last 100 lines of a log
tail -n 100 /var/log/auth.log
# Follow multiple log files simultaneously
tail -f /var/log/syslog /var/log/auth.log
# Search for a pattern in a log file (case-insensitive)
grep -i "error" /var/log/syslog
grep -i "failed" /var/log/auth.log
# Search with context — show 3 lines before and after each match
grep -B 3 -A 3 "segfault" /var/log/kern.log
# Filter logs for a specific time window
grep "Mar 12 14:" /var/log/syslog # all events at 14:xx on Mar 12
grep "Mar 12 1[4-6]:" /var/log/syslog # 14:00 to 16:59
# Count occurrences of a pattern
grep -c "Failed password" /var/log/auth.log
# Find which IPs are generating the most failed SSH logins
grep "Failed password" /var/log/auth.log | \
grep -oP '(\d{1,3}\.){3}\d{1,3}' | \
sort | uniq -c | sort -rn | head -10# grep -c "Failed password" /var/log/auth.log
3842
# grep "Failed password" /var/log/auth.log | \
grep -oP '(\d{1,3}\.){3}\d{1,3}' | sort | uniq -c | sort -rn | head -5
2847 185.220.101.42
621 45.33.32.156
198 103.99.0.122
94 66.240.205.34
82 194.165.16.11
What just happened? The pipeline extracted IP addresses from 3,842 failed SSH login attempts and counted them per source. One IP — 185.220.101.42 — is responsible for 74% of all failures, a clear brute-force attack. This pattern of grep | grep -oP | sort | uniq -c | sort -rn is one of the most useful log analysis pipelines in Linux administration.
rsyslog — How System Logs Are Written
rsyslog is the daemon that receives log messages from applications, the kernel, and other services, and routes them to the appropriate files under /var/log/. It classifies messages by facility (the source — kern, auth, mail, daemon, etc.) and severity (the importance — emerg, alert, crit, err, warning, notice, info, debug). Rules in /etc/rsyslog.conf determine where each combination is written.
Fig 2 — rsyslog receives messages, classifies them by facility and severity, then routes to files or remote servers
# View rsyslog routing rules
cat /etc/rsyslog.conf
ls /etc/rsyslog.d/
# Send a test message to syslog manually
logger "Test message from admin - $(date)"
logger -p auth.warning "Simulated auth warning"
# Verify the test message arrived
tail -5 /var/log/syslog
# Check rsyslog service status
sudo systemctl status rsyslog
# Reload rsyslog after config changes
sudo systemctl reload rsyslog# logger "Test message from admin - $(date)" # tail -5 /var/log/syslog Mar 12 14:22:01 server1 CRON[9821]: (root) CMD (/usr/lib/update-notifier/notify-motd) Mar 12 14:22:15 server1 alice: Test message from admin - Wed Mar 12 14:22:15 UTC 2025 Mar 12 14:22:31 server1 sshd[9944]: Accepted publickey for alice from 10.0.1.5 port 52341 Mar 12 14:22:45 server1 sudo: alice : TTY=pts/0 ; PWD=/home/alice ; COMMAND=/usr/bin/tail Mar 12 14:22:50 server1 alice: Simulated auth warning
What just happened? logger injected a message directly into the syslog stream — useful for marking events in log files from scripts (e.g. "deployment started at 14:22"). The syslog output shows all five standard fields: timestamp, hostname, process name, PID, and message. The sudo line also reveals that every sudo command is automatically logged — administrators cannot use sudo without leaving a record.
logrotate — Writing Rotation Configurations
logrotate is the daemon-less utility that periodically renames, compresses, and prunes log files. It runs daily via cron or a systemd timer and processes every configuration file in /etc/logrotate.d/. Writing a correct logrotate config for a custom application is a routine administration task.
Step 1 — Define which log file to manage
The config file starts with the log path (glob patterns work). Drop the file in /etc/logrotate.d/ with a descriptive name — never edit /etc/logrotate.conf directly for application logs.
/var/log/myapp/*.log {
Step 2 — Set rotation frequency and retention
daily + rotate 14 keeps 14 days of history. size 100M rotates when the file hits 100MB regardless of schedule.
daily
rotate 14
size 100M
Step 3 — Add safety and compression directives
missingok prevents errors if the log does not exist. notifempty skips rotation if the file is empty. compress + delaycompress gzips all but the most recent rotated file.
missingok
notifempty
compress
delaycompress
Step 4 — Signal the application to re-open its log file
After rotation, the app still writes to the old (now renamed) file. postrotate sends a signal or restarts the service so it opens the new log file.
postrotate
systemctl reload myapp 2>/dev/null || true
endscript
}
# Complete logrotate config for a custom application
sudo tee /etc/logrotate.d/myapp <<'EOF'
/var/log/myapp/*.log {
daily
rotate 14
size 100M
missingok
notifempty
compress
delaycompress
sharedscripts
postrotate
systemctl reload myapp 2>/dev/null || true
endscript
}
EOF
# Test the config — debug mode shows what would happen without making changes
sudo logrotate -d /etc/logrotate.d/myapp
# Force an immediate rotation to test the config live
sudo logrotate -f /etc/logrotate.d/myapp
# Check the logrotate state file — records when each log was last rotated
cat /var/lib/logrotate/status | grep myapp# sudo logrotate -d /etc/logrotate.d/myapp WARNING: logrotate in debug mode does nothing except dump debug information! reading config file /etc/logrotate.d/myapp Handling 1 logs rotating pattern: /var/log/myapp/*.log after 1 days (14 rotations) empty log files are not rotated, old logs are removed considering log /var/log/myapp/app.log log needs rotating rotating log /var/log/myapp/app.log, log->rotateCount is 14 Renaming /var/log/myapp/app.log to /var/log/myapp/app.log.1 running postrotate script # cat /var/lib/logrotate/status | grep myapp "/var/log/myapp/app.log" 2025-3-12
What just happened? The -d debug run showed exactly what logrotate would do — renaming the current log, running the postrotate script — without touching any actual files. The state file shows the last rotation date, which logrotate uses to decide whether it is time to rotate again. Always run -d first when writing a new config, then -f to force a live test rotation.
journalctl — Advanced Querying
While Lesson 14 introduced journalctl for service logs, the journal contains everything — including structured fields that enable precision queries impossible with text-file grep. Mastering these queries dramatically reduces time-to-diagnosis during incidents.
The most common incident investigation pattern: journalctl -u nginx --since "1 hour ago" shows only nginx entries from the last hour.
Journal entries carry structured metadata. journalctl _UID=1001 shows all log entries generated by alice (UID 1001), across every service.
journalctl -o json-pretty exposes all structured fields. Pipe to jq for powerful field extraction and filtering.
Equivalent to dmesg but integrated with journal time filtering. Use to diagnose hardware errors, OOM kills, and driver problems.
# Show all errors from the current boot
journalctl -p err -b 0
# Show kernel messages from the current boot (hardware/driver issues)
journalctl -k -b 0
# Show all logs for a specific user by UID
journalctl _UID=$(id -u alice)
# Show logs between two specific timestamps
journalctl --since "2025-03-12 14:00" --until "2025-03-12 15:00"
# Show logs for multiple units at once
journalctl -u nginx -u php-fpm --since "30 minutes ago"
# Extract structured fields with JSON output and jq
journalctl -u sshd -o json | jq -r 'select(.PRIORITY == "6") | .MESSAGE' | head -10
# List all unique systemd units that have journal entries
journalctl -F _SYSTEMD_UNIT | sort -u
# Show logs that match a specific executable
journalctl /usr/sbin/sshd# journalctl -u nginx -u php-fpm --since "30 minutes ago" Mar 12 14:05:01 server1 nginx[1235]: 2025/03/12 14:05:01 [warn] worker process 1236 exited on signal 11 Mar 12 14:05:01 server1 systemd[1]: nginx.service: A process of this service crashed. Mar 12 14:05:02 server1 systemd[1]: Restarting A high performance web server... Mar 12 14:05:02 server1 nginx[9901]: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok Mar 12 14:05:03 server1 php-fpm[9902]: NOTICE: fpm is running, pid 9902 Mar 12 14:05:03 server1 nginx[9901]: start worker process 9903
What just happened? Querying both nginx and php-fpm together revealed a complete incident timeline in one view: a worker process crashed on signal 11 (segfault), systemd detected it and automatically restarted nginx, which passed its config check and spawned a new worker. Without the multi-unit query, this sequence would have required correlating two separate log files.
Centralised Logging — Shipping Logs Off the Server
On single servers, local log files are sufficient. On fleets of servers, centralised logging is essential — it lets you search across all machines from one place, retain logs after a server is terminated, and detect patterns that only appear when correlating events across multiple systems. rsyslog's forwarding capability and the standardised syslog protocol make this straightforward to configure.
Local logging only
- Logs lost if server is terminated
- Must SSH to each server to investigate
- No cross-server correlation
- Attacker can delete logs to cover tracks
- Sufficient for a single server
Centralised logging
- Logs preserved after server termination
- Single pane of glass across the fleet
- Cross-server timeline correlation
- Tamper-evident off-host copy
- Required for compliance (SOC 2, PCI, HIPAA)
# Forward all logs to a remote syslog server via rsyslog
# Add to /etc/rsyslog.d/50-remote.conf
# Forward using UDP (faster, no delivery guarantee)
sudo tee /etc/rsyslog.d/50-remote.conf <<'EOF'
*.* @logs.example.com:514 # UDP
*.* @@logs.example.com:514 # TCP (reliable, use this for production)
EOF
sudo systemctl restart rsyslog
# Forward the systemd journal to a remote syslog server
# (requires rsyslog-journal module)
sudo tee -a /etc/rsyslog.conf <<'EOF'
module(load="imjournal")
*.* @@logs.example.com:514
EOF
# Test forwarding with logger
logger "Centralised log forwarding test"
# Common centralised logging destinations:
# - Elasticsearch + Kibana (ELK stack) — self-hosted, powerful
# - Grafana Loki — lightweight, integrates with Prometheus
# - Datadog / Splunk / Papertrail — managed SaaS solutions
# - CloudWatch Logs (AWS) / Cloud Logging (GCP) — cloud-nativeAnalogy: Local log files are like a notebook kept at each shop counter — useful for that one location, but if the shop burns down, the notebook is gone too. Centralised logging is the head office receiving a copy of every transaction in real time. Even if an individual location is destroyed, the records are preserved and available for investigation.
The postrotate Script Must Signal the Application — Not Just Rotate the File
After logrotate renames app.log to app.log.1, the running application still has the old file open and writes new data into app.log.1. The new app.log is never created and remains empty. Without a postrotate script that signals the application to close and re-open its log handle (via kill -HUP, systemctl reload, or a custom signal), logrotate silently fails to rotate the active log. Always verify rotation worked with ls -lth /var/log/myapp/ after forcing a test rotation.
Lesson Checklist
auth.log / secure), general system events (syslog / messages), and hardware problems (kern.log)
grep -oP | sort | uniq -c | sort -rn to extract and count patterns from log files for security and traffic analysis
postrotate script, and I always test with -d before -f
journalctl with structured field filters (_UID=, -u, --since, -p) to extract precise log slices for incident investigation
Teacher's Note
The IP extraction pipeline — grep "Failed password" /var/log/auth.log | grep -oP '(\d{1,3}\.){3}\d{1,3}' | sort | uniq -c | sort -rn — is worth memorising verbatim. It surfaces brute-force attack sources in seconds and works unchanged on any server. The same pattern with different grep patterns can count 404 errors by URL, API failures by endpoint, or any other repeated event in any structured log.
Practice Questions
1. A web application starts returning 502 errors at 14:30. Write the exact sequence of commands you would run to investigate — including which log files to check, what time filters to apply, and how to correlate nginx and the backend service logs to build a timeline of what happened.
sudo grep "14:3" /var/log/nginx/error.log or sudo journalctl -u nginx --since "14:25" --until "14:45". Then check the backend: sudo journalctl -u myapp --since "14:25" --until "14:45". For file-based logs: sudo grep "14:3" /var/log/myapp/app.log. Build the timeline by comparing timestamps — 502s in nginx usually mean the backend crashed or stopped accepting connections. Check sudo journalctl -p err --since "14:25" for kernel or OOM events that may have killed the backend process.
2. You write a logrotate config for /var/log/api/*.log and force a test rotation with logrotate -f. After rotation, api.log.1 exists but the new api.log never gets any entries — the service keeps writing to api.log.1. What is missing from your config and how do you fix it?
postrotate script to signal the application to reopen its log file handle. After rotation, the process still holds an open file descriptor to the renamed file (api.log.1). Fix: add a postrotate block that sends the service a signal to reopen logs, e.g. postrotate / systemctl reload myapi / endscript. Alternatively use copytruncate if the app cannot reopen files — this copies then truncates the original in place.
3. Explain the difference between compress and delaycompress in logrotate. Why is using them together usually better than using compress alone?
compress gzips rotated logs to save disk space. delaycompress postpones compression of the most recently rotated file (log.1) by one rotation cycle. Used together, the just-rotated file stays uncompressed for one period — this matters because some applications may still be writing to the file handle briefly after rotation, and compressing an open file can corrupt it or cause data loss. Delaying compression ensures the file is fully closed before it is gzipped.
Lesson Quiz
1. Which log file is the correct first place to look when investigating failed SSH login attempts on a Debian/Ubuntu system?
2. In a logrotate config, what does rotate 14 combined with daily mean for log retention?
3. What does the logger command do, and why is it useful in shell scripts?
Up Next
Lesson 22 — Monitoring System Resources
Real-time and historical monitoring of CPU, memory, I/O, and network using vmstat, iostat, sar, and more