Linux Administration
The Linux Boot Process
In this lesson
The Linux boot process is the sequence of events that transforms a powered-off machine into a fully running operating system. Each stage hands control to the next — firmware to bootloader, bootloader to kernel, kernel to init system. Understanding this sequence is essential when a system fails to boot, when you need to recover from a broken configuration, or when you want to understand why some services start before others.
The Full Boot Sequence
From power button to login prompt, a Linux system passes through six distinct stages. Each stage can fail independently, which is why understanding their boundaries makes boot problems straightforward to diagnose and isolate.
Fig 1 — The six stages of the Linux boot process, with approximate timing
# View kernel boot messages from the current boot
dmesg | head -50
journalctl -k -b 0 | head -50
# Check how long the last boot took — systemd's boot timing analysis
systemd-analyze
# Show per-unit startup times — find what is slowing boot
systemd-analyze blame | head -20
# Visualise the boot as a dependency chain (generates an SVG)
systemd-analyze plot > /tmp/boot.svg
# Show critical path of the boot (the sequence that determined total boot time)
systemd-analyze critical-chain# systemd-analyze
Startup finished in 2.841s (firmware) + 3.102s (loader) + 1.243s (kernel) \
+ 2.814s (initrd) + 8.431s (userspace) = 18.431s
graphical.target reached after 18.320s in userspace
# systemd-analyze blame | head -10
4.210s NetworkManager-wait-online.service
2.103s cloud-init.service
1.821s dev-sda3.device
1.312s plymouth-quit-wait.service
0.944s snapd.service
0.821s apt-daily.service
0.714s systemd-journal-flush.service
What just happened? systemd-analyze broke boot time into its constituent stages — firmware took 2.8 seconds (slow UEFI or BIOS POST), and userspace took 8.4 seconds. blame identified that NetworkManager-wait-online at 4.2 seconds is the biggest userspace delay — this service waits for a network connection before continuing, and is often safely disabled on servers with predictable network interfaces.
BIOS and UEFI — Firmware Stage
The firmware is the first code that runs when a machine is powered on. It performs the POST (Power-On Self Test), initialises hardware, and locates the bootloader. Modern systems use UEFI (Unified Extensible Firmware Interface), which replaced the decades-old BIOS and brings significant improvements — including a dedicated EFI System Partition and Secure Boot.
BIOS — legacy
- 16-bit code, runs from read-only chip
- Reads the first 512 bytes of a disk (MBR) to find the bootloader
- Disk limit: 2TB maximum with MBR
- No Secure Boot — cannot verify bootloader integrity
- Boot order controlled in BIOS setup screen
- Still found on older servers and VMs
UEFI — modern
- 32/64-bit code with a full driver model
- Reads EFI System Partition (ESP) — a FAT32
/boot/efipartition - No disk size limit with GPT
- Secure Boot — verifies bootloader signature
- Boot entries stored in NVRAM — no boot disk required to manage boot order
- Standard on all modern hardware since ~2012
# Check whether the system booted in UEFI or BIOS mode
[ -d /sys/firmware/efi ] && echo "UEFI mode" || echo "BIOS/Legacy mode"
# List UEFI boot entries stored in NVRAM
sudo efibootmgr -v
# Show the EFI System Partition contents
ls -la /boot/efi/EFI/
# Check if Secure Boot is enabled
sudo mokutil --sb-state
# View firmware variables exposed via sysfs
ls /sys/firmware/efi/vars/ | head -10# [ -d /sys/firmware/efi ] && echo "UEFI mode" || echo "BIOS/Legacy mode" UEFI mode # sudo efibootmgr -v BootCurrent: 0001 Timeout: 1 seconds BootOrder: 0001,0000,0002 Boot0000* Windows Boot Manager Boot0001* ubuntu HD(1,GPT,...)/File(\EFI\ubuntu\shimx64.efi) Boot0002* UEFI: PXE Boot # sudo mokutil --sb-state SecureBoot enabled
What just happened? efibootmgr listed three boot entries in NVRAM — Windows, Ubuntu, and PXE network boot — with Ubuntu as the current default (BootCurrent: 0001). The Ubuntu entry points to shimx64.efi — the Secure Boot shim that chains to GRUB. With Secure Boot enabled, any bootloader that is not signed by a trusted key will be refused by the firmware before it can even run.
GRUB2 — The Bootloader
GRUB2 (Grand Unified Bootloader version 2) is the bootloader used by virtually all major Linux distributions. It reads its configuration, displays the boot menu, loads the selected kernel image and initramfs into memory, passes kernel parameters, and transfers execution to the kernel. GRUB2 configuration is generated by the grub-mkconfig tool — you edit template files, then regenerate.
| Setting in /etc/default/grub | Effect |
|---|---|
GRUB_TIMEOUT=5 |
Seconds to display the boot menu before auto-booting the default entry. Set to -1 to wait indefinitely. |
GRUB_DEFAULT=0 |
Which menu entry to boot by default — 0 = first entry. Can also be set to a menu title string. |
GRUB_CMDLINE_LINUX |
Kernel parameters passed at every boot. Common additions: quiet splash, nomodeset, systemd.unit=rescue.target. |
GRUB_CMDLINE_LINUX_DEFAULT |
Parameters added only for normal boots (not recovery mode). Remove quiet splash here to see verbose boot messages. |
GRUB_DISABLE_RECOVERY |
Set to true to hide recovery mode entries from the boot menu (security hardening on production servers). |
# View GRUB configuration defaults
cat /etc/default/grub
# Edit GRUB settings — always back up first
sudo cp /etc/default/grub /etc/default/grub.bak
sudo nano /etc/default/grub
# Regenerate GRUB config after editing /etc/default/grub
sudo update-grub # Debian/Ubuntu
sudo grub2-mkconfig -o /boot/grub2/grub.cfg # RHEL/Rocky
# View the generated GRUB config (menu entries are here)
cat /boot/grub/grub.cfg | grep menuentry
# List available kernel versions installed
ls /boot/vmlinuz-*
# View current kernel parameters that were used at boot
cat /proc/cmdline# cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-6.5.0-1021-aws root=UUID=a1b2c3d4-1111-2222-3333 \ ro quiet splash vt.handoff=7 # ls /boot/vmlinuz-* /boot/vmlinuz-6.5.0-1017-aws /boot/vmlinuz-6.5.0-1019-aws /boot/vmlinuz-6.5.0-1021-aws # cat /boot/grub/grub.cfg | grep "menuentry " menuentry 'Ubuntu, with Linux 6.5.0-1021-aws' ... menuentry 'Ubuntu, with Linux 6.5.0-1021-aws (recovery mode)' ... menuentry 'Ubuntu, with Linux 6.5.0-1019-aws' ... menuentry 'Ubuntu, with Linux 6.5.0-1017-aws' ...
What just happened? /proc/cmdline showed the exact parameters the running kernel received from GRUB — including the root partition UUID and the quiet splash flags that suppress verbose boot messages. Three kernel versions are installed — the GRUB menu will offer all three, enabling you to boot an older kernel if the newest causes problems.
Kernel Initialisation and initramfs
Once GRUB loads the kernel (vmlinuz) and the initial RAM filesystem (initramfs) into memory, the kernel decompresses itself and begins hardware detection. The initramfs is a compressed cpio archive that contains a minimal root filesystem — just enough to mount the real root partition, including drivers for the storage controller, LVM, RAID, or disk encryption.
Analogy: The initramfs is like a rescue kit packed into the ambulance. Before the paramedic can reach the hospital (the real root filesystem), they need the basic equipment in that bag — oxygen, a defibrillator, bandages. The initramfs provides the minimum tools the kernel needs to reach the real root: storage drivers, LVM tools, decryption utilities. Once the real root is mounted, the rescue kit is discarded.
# List kernel and initramfs files in /boot
ls -lh /boot/
# View the contents of the initramfs (it is a compressed cpio archive)
lsinitramfs /boot/initrd.img-$(uname -r) | head -30 # Debian/Ubuntu
lsinitrd /boot/initramfs-$(uname -r).img | head -30 # RHEL/Rocky
# Rebuild the initramfs for the current kernel
sudo update-initramfs -u # Debian/Ubuntu
sudo dracut --force # RHEL/Rocky
# Rebuild for all installed kernels
sudo update-initramfs -u -k all # Debian/Ubuntu
# Check current running kernel version
uname -r
# View kernel ring buffer — early boot messages
dmesg | grep -E "Linux version|Kernel|CPU|Memory" | head -10# ls -lh /boot/ total 196M -rw-r--r-- 1 root root 260K Mar 1 config-6.5.0-1021-aws drwxr-xr-x 6 root root 4.0K Mar 1 grub/ -rw------- 1 root root 78M Mar 1 initrd.img-6.5.0-1021-aws lrwxrwxrwx 1 root root 33 Mar 1 initrd.img -> initrd.img-6.5.0-1021-aws -rw------- 1 root root 11M Mar 1 vmlinuz-6.5.0-1021-aws lrwxrwxrwx 1 root root 30 Mar 1 vmlinuz -> vmlinuz-6.5.0-1021-aws # uname -r 6.5.0-1021-aws # dmesg | grep "Linux version" | head -1 [ 0.000000] Linux version 6.5.0-1021-aws (buildd@lcy02-amd64-022) \ (gcc version 13.2.0) #21-Ubuntu SMP Thu Feb 15 22:00:06 UTC 2025
What just happened? The initramfs at 78MB is significantly larger than the kernel at 11MB — this is normal. It contains a complete minimal userspace including udev, systemd-udevd, storage drivers, and filesystem tools. The vmlinuz symlink always points to the current default kernel — GRUB uses this symlink so that update-grub can automatically keep the default entry current after kernel upgrades.
Boot Troubleshooting
Most boot failures fall into one of four categories: GRUB cannot find the configuration, the kernel panics during hardware detection, the initramfs cannot mount the root filesystem, or systemd fails to start a critical service. Each has a distinct recovery approach.
Causes: deleted /boot, changed partition UUID, corrupted GRUB install. Fix: boot from live media, chroot into the system, run grub-install and update-grub.
Often shows "VFS: Unable to mount root fs". Causes: wrong root UUID in kernel parameters, missing storage driver in initramfs, corrupted root filesystem. Fix: boot previous kernel, rebuild initramfs, or run fsck from live media.
Typically caused by a broken /etc/fstab entry, a failed required service, or a corrupted filesystem. The emergency shell has root access — fix the fstab, run fsck, or disable the failing unit.
System boots but a service is in failed state. Check with systemctl list-units --state=failed and journalctl -u servicename -b 0.
# ── Rescue / Recovery boot techniques ────────────────────────────
# Edit kernel parameters temporarily at the GRUB menu:
# 1. At GRUB menu, press 'e' to edit the selected entry
# 2. Find the 'linux' line, navigate to the end
# 3. Add one of these recovery parameters:
# systemd.unit=rescue.target — minimal single-user shell
# systemd.unit=emergency.target — bare minimum (read-only root)
# init=/bin/bash — bypass init entirely
# rd.break — break into initramfs shell before pivot
# 4. Press Ctrl+X to boot with the modified parameters
# ── After reaching the emergency / rescue shell ───────────────────
# Remount root filesystem as read-write for repairs
mount -o remount,rw /
# Check and repair a filesystem (unmount or use live media first)
fsck -y /dev/sda3
# Fix a broken fstab
nano /etc/fstab
# Disable a failing service so the system can boot
systemctl disable problematic.service
# or mask it entirely:
systemctl mask problematic.service
# Regenerate GRUB from a chroot (after booting live media)
sudo mount /dev/sda3 /mnt
sudo mount /dev/sda2 /mnt/boot
sudo mount --bind /dev /mnt/dev
sudo mount --bind /proc /mnt/proc
sudo mount --bind /sys /mnt/sys
sudo chroot /mnt
grub-install /dev/sda
update-grub
exit# journalctl -b 0 -p err --no-pager | head -15 Mar 12 08:01:02 server1 kernel: EXT4-fs error (device sda3): ext4_find_entry Mar 12 08:01:04 server1 systemd[1]: Failed to mount /data. Mar 12 08:01:04 server1 systemd[1]: Dependency failed for Local File Systems. Mar 12 08:01:04 server1 systemd[1]: Job local-fs.target/start failed. # systemctl list-units --state=failed UNIT LOAD ACTIVE SUB DESCRIPTION ● data.mount loaded failed failed /data ● postgresql.service loaded failed failed PostgreSQL RDBMS LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state. SUB = The low-level unit activation state.
What just happened? The journal showed a cascade: a filesystem error on sda3 caused the /data mount to fail, which caused local-fs.target to fail, which caused PostgreSQL to fail — because it depends on /data being mounted. One filesystem error produced two failed units. Reading the journal from the earliest error upward reveals the actual root cause.
Boot Targets and Optimisation
Once the initramfs hands control to the real root filesystem, systemd (PID 1) begins activating units in dependency order to reach the default boot target. Understanding how to read and optimise this process reduces boot time and helps identify which services are truly needed at startup.
Identify slow units with systemd-analyze blame
Shows each unit's contribution to boot time in descending order. Focus on anything over 1 second — these are candidates for investigation.
systemd-analyze blame | head -10
Disable services not needed at boot
Services that are enabled but rarely needed (bluetooth, cups, ModemManager) add startup latency on servers. Disable services that are not required for your workload.
sudo systemctl disable bluetooth.service cups.service ModemManager.service
Handle NetworkManager-wait-online on servers
This service waits for a network connection before proceeding — often 4+ seconds. On cloud servers with predictable interfaces, it is usually safe to disable.
sudo systemctl disable NetworkManager-wait-online.service
Mask services that must never run
disable prevents auto-start but still allows manual start. mask creates a symlink to /dev/null — the service cannot be started by any means until unmasked.
sudo systemctl mask avahi-daemon.service
Never Edit /boot/grub/grub.cfg Directly — Always Use update-grub
The file /boot/grub/grub.cfg is generated automatically by update-grub (or grub2-mkconfig) — its first line even says "DO NOT EDIT THIS FILE". Any manual changes will be silently overwritten the next time a kernel is installed or updated. All permanent changes go into /etc/default/grub or a custom script in /etc/grub.d/, followed by regenerating the config. Editing grub.cfg directly is only appropriate for one-time emergency recovery from the GRUB command line.
Lesson Checklist
[ -d /sys/firmware/efi ]
/etc/default/grub for permanent changes and always run update-grub / grub2-mkconfig afterwards — never edit grub.cfg directly
systemd.unit=rescue.target or rd.break to kernel parameters at the GRUB menu to enter recovery mode without physical access to the machine
systemd-analyze blame to identify slow boot services and journalctl -b 0 -p err to find errors from the current boot
Teacher's Note
The most practically valuable technique from this lesson is pressing e at the GRUB menu and appending systemd.unit=rescue.target to the kernel line. This gives you a root shell on any Linux system regardless of what is wrong with the running configuration — broken fstab, failed services, forgotten root password (with init=/bin/bash). Knowing this technique means you can recover from almost any boot failure without physical media.
Practice Questions
1. A server fails to boot and drops to a GRUB rescue prompt with the message error: no such partition. What has likely happened, and describe the complete recovery procedure using a live USB — including which commands you would run once chrooted into the broken system.
sudo mount /dev/sdaX /mnt → mount required filesystems: sudo mount --bind /dev /mnt/dev && sudo mount --bind /proc /mnt/proc && sudo mount --bind /sys /mnt/sys → chroot: sudo chroot /mnt → regenerate GRUB config: update-grub or grub2-mkconfig -o /boot/grub2/grub.cfg → reinstall GRUB to the disk: grub-install /dev/sda → exit and reboot.
2. You want to reduce boot time on a cloud server. systemd-analyze blame shows NetworkManager-wait-online.service taking 4.1 seconds. Explain what this service does, why it can safely be disabled on most cloud servers, and write the command to disable it.
sudo systemctl disable NetworkManager-wait-online.service. Verify improvement with systemd-analyze after reboot.
3. What is the initramfs, why does Linux need it, and what would happen during boot if the initramfs was missing the driver for the server's NVMe storage controller?
Lesson Quiz
1. After editing /etc/default/grub to change GRUB_TIMEOUT, you reboot but the timeout has not changed. What did you forget?
2. A server running UEFI shows SecureBoot enabled. A custom-compiled kernel you built fails to load. What is the most likely reason?
3. A system drops to an emergency shell during boot with the message Failed to mount /data. What is the most likely cause and what is the first command you should run?
Up Next
Lesson 25 — Linux Administration Best Practices
The habits, disciplines, and operational standards that define professional Linux administration at scale