Linux Administration Lesson 24 – Linux Boot Process | Dataplexa

Section II — User, Process & Package Management

The Linux Boot Process

In this lesson

BIOS and UEFI GRUB bootloader Kernel and initramfs systemd init Boot troubleshooting

The Linux boot process is the sequence of events that transforms a powered-off machine into a fully running operating system. Each stage hands control to the next — firmware to bootloader, bootloader to kernel, kernel to init system. Understanding this sequence is essential when a system fails to boot, when you need to recover from a broken configuration, or when you want to understand why some services start before others.

The Full Boot Sequence

From power button to login prompt, a Linux system passes through six distinct stages. Each stage can fail independently, which is why understanding their boundaries makes boot problems straightforward to diagnose and isolate.

Fig 1 — The six stages of the Linux boot process, with approximate timing

# View kernel boot messages from the current boot
dmesg | head -50
journalctl -k -b 0 | head -50

# Check how long the last boot took — systemd's boot timing analysis
systemd-analyze

# Show per-unit startup times — find what is slowing boot
systemd-analyze blame | head -20

# Visualise the boot as a dependency chain (generates an SVG)
systemd-analyze plot > /tmp/boot.svg

# Show critical path of the boot (the sequence that determined total boot time)
systemd-analyze critical-chain

# systemd-analyze
Startup finished in 2.841s (firmware) + 3.102s (loader) + 1.243s (kernel) \
  + 2.814s (initrd) + 8.431s (userspace) = 18.431s
graphical.target reached after 18.320s in userspace

# systemd-analyze blame | head -10
          4.210s NetworkManager-wait-online.service
          2.103s cloud-init.service
          1.821s dev-sda3.device
          1.312s plymouth-quit-wait.service
          0.944s snapd.service
          0.821s apt-daily.service
          0.714s systemd-journal-flush.service

What just happened? systemd-analyze broke boot time into its constituent stages — firmware took 2.8 seconds (slow UEFI or BIOS POST), and userspace took 8.4 seconds. blame identified that NetworkManager-wait-online at 4.2 seconds is the biggest userspace delay — this service waits for a network connection before continuing, and is often safely disabled on servers with predictable network interfaces.

BIOS and UEFI — Firmware Stage

The firmware is the first code that runs when a machine is powered on. It performs the POST (Power-On Self Test), initialises hardware, and locates the bootloader. Modern systems use UEFI (Unified Extensible Firmware Interface), which replaced the decades-old BIOS and brings significant improvements — including a dedicated EFI System Partition and Secure Boot.

BIOS — legacy

16-bit code, runs from read-only chip
Reads the first 512 bytes of a disk (MBR) to find the bootloader
Disk limit: 2TB maximum with MBR
No Secure Boot — cannot verify bootloader integrity
Boot order controlled in BIOS setup screen
Still found on older servers and VMs

UEFI — modern

32/64-bit code with a full driver model
Reads EFI System Partition (ESP) — a FAT32 /boot/efi partition
No disk size limit with GPT
Secure Boot — verifies bootloader signature
Boot entries stored in NVRAM — no boot disk required to manage boot order
Standard on all modern hardware since ~2012

# Check whether the system booted in UEFI or BIOS mode
[ -d /sys/firmware/efi ] && echo "UEFI mode" || echo "BIOS/Legacy mode"

# List UEFI boot entries stored in NVRAM
sudo efibootmgr -v

# Show the EFI System Partition contents
ls -la /boot/efi/EFI/

# Check if Secure Boot is enabled
sudo mokutil --sb-state

# View firmware variables exposed via sysfs
ls /sys/firmware/efi/vars/ | head -10

# [ -d /sys/firmware/efi ] && echo "UEFI mode" || echo "BIOS/Legacy mode"
UEFI mode

# sudo efibootmgr -v
BootCurrent: 0001
Timeout: 1 seconds
BootOrder: 0001,0000,0002
Boot0000* Windows Boot Manager
Boot0001* ubuntu   HD(1,GPT,...)/File(\EFI\ubuntu\shimx64.efi)
Boot0002* UEFI: PXE Boot

# sudo mokutil --sb-state
SecureBoot enabled

What just happened? efibootmgr listed three boot entries in NVRAM — Windows, Ubuntu, and PXE network boot — with Ubuntu as the current default (BootCurrent: 0001). The Ubuntu entry points to shimx64.efi — the Secure Boot shim that chains to GRUB. With Secure Boot enabled, any bootloader that is not signed by a trusted key will be refused by the firmware before it can even run.

GRUB2 — The Bootloader

GRUB2 (Grand Unified Bootloader version 2) is the bootloader used by virtually all major Linux distributions. It reads its configuration, displays the boot menu, loads the selected kernel image and initramfs into memory, passes kernel parameters, and transfers execution to the kernel. GRUB2 configuration is generated by the grub-mkconfig tool — you edit template files, then regenerate.

GRUB2 — Key Configuration Settings

Setting in /etc/default/grub	Effect
`GRUB_TIMEOUT=5`	Seconds to display the boot menu before auto-booting the default entry. Set to `-1` to wait indefinitely.
`GRUB_DEFAULT=0`	Which menu entry to boot by default — 0 = first entry. Can also be set to a menu title string.
`GRUB_CMDLINE_LINUX`	Kernel parameters passed at every boot. Common additions: `quiet splash`, `nomodeset`, `systemd.unit=rescue.target`.
`GRUB_CMDLINE_LINUX_DEFAULT`	Parameters added only for normal boots (not recovery mode). Remove `quiet splash` here to see verbose boot messages.
`GRUB_DISABLE_RECOVERY`	Set to `true` to hide recovery mode entries from the boot menu (security hardening on production servers).

# View GRUB configuration defaults
cat /etc/default/grub

# Edit GRUB settings — always back up first
sudo cp /etc/default/grub /etc/default/grub.bak
sudo nano /etc/default/grub

# Regenerate GRUB config after editing /etc/default/grub
sudo update-grub                          # Debian/Ubuntu
sudo grub2-mkconfig -o /boot/grub2/grub.cfg  # RHEL/Rocky

# View the generated GRUB config (menu entries are here)
cat /boot/grub/grub.cfg | grep menuentry

# List available kernel versions installed
ls /boot/vmlinuz-*

# View current kernel parameters that were used at boot
cat /proc/cmdline

# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.5.0-1021-aws root=UUID=a1b2c3d4-1111-2222-3333 \
  ro quiet splash vt.handoff=7

# ls /boot/vmlinuz-*
/boot/vmlinuz-6.5.0-1017-aws
/boot/vmlinuz-6.5.0-1019-aws
/boot/vmlinuz-6.5.0-1021-aws

# cat /boot/grub/grub.cfg | grep "menuentry "
menuentry 'Ubuntu, with Linux 6.5.0-1021-aws' ...
menuentry 'Ubuntu, with Linux 6.5.0-1021-aws (recovery mode)' ...
menuentry 'Ubuntu, with Linux 6.5.0-1019-aws' ...
menuentry 'Ubuntu, with Linux 6.5.0-1017-aws' ...

What just happened? /proc/cmdline showed the exact parameters the running kernel received from GRUB — including the root partition UUID and the quiet splash flags that suppress verbose boot messages. Three kernel versions are installed — the GRUB menu will offer all three, enabling you to boot an older kernel if the newest causes problems.

Kernel Initialisation and initramfs

Once GRUB loads the kernel (vmlinuz) and the initial RAM filesystem (initramfs) into memory, the kernel decompresses itself and begins hardware detection. The initramfs is a compressed cpio archive that contains a minimal root filesystem — just enough to mount the real root partition, including drivers for the storage controller, LVM, RAID, or disk encryption.

Analogy: The initramfs is like a rescue kit packed into the ambulance. Before the paramedic can reach the hospital (the real root filesystem), they need the basic equipment in that bag — oxygen, a defibrillator, bandages. The initramfs provides the minimum tools the kernel needs to reach the real root: storage drivers, LVM tools, decryption utilities. Once the real root is mounted, the rescue kit is discarded.

# List kernel and initramfs files in /boot
ls -lh /boot/

# View the contents of the initramfs (it is a compressed cpio archive)
lsinitramfs /boot/initrd.img-$(uname -r) | head -30   # Debian/Ubuntu
lsinitrd /boot/initramfs-$(uname -r).img | head -30    # RHEL/Rocky

# Rebuild the initramfs for the current kernel
sudo update-initramfs -u                               # Debian/Ubuntu
sudo dracut --force                                    # RHEL/Rocky

# Rebuild for all installed kernels
sudo update-initramfs -u -k all                        # Debian/Ubuntu

# Check current running kernel version
uname -r

# View kernel ring buffer — early boot messages
dmesg | grep -E "Linux version|Kernel|CPU|Memory" | head -10

# ls -lh /boot/
total 196M
-rw-r--r-- 1 root root  260K Mar  1 config-6.5.0-1021-aws
drwxr-xr-x 6 root root  4.0K Mar  1 grub/
-rw------- 1 root root   78M Mar  1 initrd.img-6.5.0-1021-aws
lrwxrwxrwx 1 root root   33  Mar  1 initrd.img -> initrd.img-6.5.0-1021-aws
-rw------- 1 root root   11M Mar  1 vmlinuz-6.5.0-1021-aws
lrwxrwxrwx 1 root root   30  Mar  1 vmlinuz -> vmlinuz-6.5.0-1021-aws

# uname -r
6.5.0-1021-aws

# dmesg | grep "Linux version" | head -1
[    0.000000] Linux version 6.5.0-1021-aws (buildd@lcy02-amd64-022) \
  (gcc version 13.2.0) #21-Ubuntu SMP Thu Feb 15 22:00:06 UTC 2025

What just happened? The initramfs at 78MB is significantly larger than the kernel at 11MB — this is normal. It contains a complete minimal userspace including udev, systemd-udevd, storage drivers, and filesystem tools. The vmlinuz symlink always points to the current default kernel — GRUB uses this symlink so that update-grub can automatically keep the default entry current after kernel upgrades.

Boot Troubleshooting

Most boot failures fall into one of four categories: GRUB cannot find the configuration, the kernel panics during hardware detection, the initramfs cannot mount the root filesystem, or systemd fails to start a critical service. Each has a distinct recovery approach.

GRUB rescue>

GRUB cannot find its config or partition

Causes: deleted /boot, changed partition UUID, corrupted GRUB install. Fix: boot from live media, chroot into the system, run grub-install and update-grub.

Kernel panic

Kernel cannot mount root or hits a fatal error

Often shows "VFS: Unable to mount root fs". Causes: wrong root UUID in kernel parameters, missing storage driver in initramfs, corrupted root filesystem. Fix: boot previous kernel, rebuild initramfs, or run fsck from live media.

Emergency shell

systemd drops to emergency.target

Typically caused by a broken /etc/fstab entry, a failed required service, or a corrupted filesystem. The emergency shell has root access — fix the fstab, run fsck, or disable the failing unit.

Service fails

systemd unit fails during normal startup

System boots but a service is in failed state. Check with systemctl list-units --state=failed and journalctl -u servicename -b 0.

# ── Rescue / Recovery boot techniques ────────────────────────────

# Edit kernel parameters temporarily at the GRUB menu:
# 1. At GRUB menu, press 'e' to edit the selected entry
# 2. Find the 'linux' line, navigate to the end
# 3. Add one of these recovery parameters:
#    systemd.unit=rescue.target     — minimal single-user shell
#    systemd.unit=emergency.target  — bare minimum (read-only root)
#    init=/bin/bash                 — bypass init entirely
#    rd.break                       — break into initramfs shell before pivot
# 4. Press Ctrl+X to boot with the modified parameters

# ── After reaching the emergency / rescue shell ───────────────────

# Remount root filesystem as read-write for repairs
mount -o remount,rw /

# Check and repair a filesystem (unmount or use live media first)
fsck -y /dev/sda3

# Fix a broken fstab
nano /etc/fstab

# Disable a failing service so the system can boot
systemctl disable problematic.service
# or mask it entirely:
systemctl mask problematic.service

# Regenerate GRUB from a chroot (after booting live media)
sudo mount /dev/sda3 /mnt
sudo mount /dev/sda2 /mnt/boot
sudo mount --bind /dev /mnt/dev
sudo mount --bind /proc /mnt/proc
sudo mount --bind /sys /mnt/sys
sudo chroot /mnt
grub-install /dev/sda
update-grub
exit

# journalctl -b 0 -p err --no-pager | head -15
Mar 12 08:01:02 server1 kernel: EXT4-fs error (device sda3): ext4_find_entry
Mar 12 08:01:04 server1 systemd[1]: Failed to mount /data.
Mar 12 08:01:04 server1 systemd[1]: Dependency failed for Local File Systems.
Mar 12 08:01:04 server1 systemd[1]: Job local-fs.target/start failed.

# systemctl list-units --state=failed
  UNIT              LOAD   ACTIVE SUB    DESCRIPTION
● data.mount        loaded failed failed /data
● postgresql.service loaded failed failed PostgreSQL RDBMS

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state.
SUB    = The low-level unit activation state.

What just happened? The journal showed a cascade: a filesystem error on sda3 caused the /data mount to fail, which caused local-fs.target to fail, which caused PostgreSQL to fail — because it depends on /data being mounted. One filesystem error produced two failed units. Reading the journal from the earliest error upward reveals the actual root cause.

Boot Targets and Optimisation

Once the initramfs hands control to the real root filesystem, systemd (PID 1) begins activating units in dependency order to reach the default boot target. Understanding how to read and optimise this process reduces boot time and helps identify which services are truly needed at startup.

Identify slow units with systemd-analyze blame

Shows each unit's contribution to boot time in descending order. Focus on anything over 1 second — these are candidates for investigation.

systemd-analyze blame | head -10

Disable services not needed at boot

Services that are enabled but rarely needed (bluetooth, cups, ModemManager) add startup latency on servers. Disable services that are not required for your workload.

sudo systemctl disable bluetooth.service cups.service ModemManager.service

Handle NetworkManager-wait-online on servers

This service waits for a network connection before proceeding — often 4+ seconds. On cloud servers with predictable interfaces, it is usually safe to disable.

sudo systemctl disable NetworkManager-wait-online.service

Mask services that must never run

disable prevents auto-start but still allows manual start. mask creates a symlink to /dev/null — the service cannot be started by any means until unmasked.

sudo systemctl mask avahi-daemon.service

Never Edit /boot/grub/grub.cfg Directly — Always Use update-grub

The file /boot/grub/grub.cfg is generated automatically by update-grub (or grub2-mkconfig) — its first line even says "DO NOT EDIT THIS FILE". Any manual changes will be silently overwritten the next time a kernel is installed or updated. All permanent changes go into /etc/default/grub or a custom script in /etc/grub.d/, followed by regenerating the config. Editing grub.cfg directly is only appropriate for one-time emergency recovery from the GRUB command line.

Lesson Checklist

✔ I can name all six boot stages in order and explain what each one does and what can fail at each boundary

✔ I know the difference between BIOS/MBR and UEFI/GPT boot, and I can check which mode a running system used with [ -d /sys/firmware/efi ]

✔ I edit /etc/default/grub for permanent changes and always run update-grub / grub2-mkconfig afterwards — never edit grub.cfg directly

✔ I can add systemd.unit=rescue.target or rd.break to kernel parameters at the GRUB menu to enter recovery mode without physical access to the machine

✔ I use systemd-analyze blame to identify slow boot services and journalctl -b 0 -p err to find errors from the current boot

Teacher's Note

The most practically valuable technique from this lesson is pressing e at the GRUB menu and appending systemd.unit=rescue.target to the kernel line. This gives you a root shell on any Linux system regardless of what is wrong with the running configuration — broken fstab, failed services, forgotten root password (with init=/bin/bash). Knowing this technique means you can recover from almost any boot failure without physical media.

Practice Questions

1. A server fails to boot and drops to a GRUB rescue prompt with the message error: no such partition. What has likely happened, and describe the complete recovery procedure using a live USB — including which commands you would run once chrooted into the broken system.

Likely cause: the boot partition UUID has changed (e.g. after a disk replacement or repartition), or the GRUB configuration is pointing to the wrong device. Recovery: boot from a live USB → mount the root partition: sudo mount /dev/sdaX /mnt → mount required filesystems: sudo mount --bind /dev /mnt/dev && sudo mount --bind /proc /mnt/proc && sudo mount --bind /sys /mnt/sys → chroot: sudo chroot /mnt → regenerate GRUB config: update-grub or grub2-mkconfig -o /boot/grub2/grub.cfg → reinstall GRUB to the disk: grub-install /dev/sda → exit and reboot.

2. You want to reduce boot time on a cloud server. systemd-analyze blame shows NetworkManager-wait-online.service taking 4.1 seconds. Explain what this service does, why it can safely be disabled on most cloud servers, and write the command to disable it.

The service blocks the boot process until a full network connection is established — it ensures network-dependent services do not start before connectivity is ready. On cloud servers, network is provided by the hypervisor and is available almost instantly, so the service's wait is unnecessary. It is safe to disable because cloud networking does not need this gating. Command: sudo systemctl disable NetworkManager-wait-online.service. Verify improvement with systemd-analyze after reboot.

3. What is the initramfs, why does Linux need it, and what would happen during boot if the initramfs was missing the driver for the server's NVMe storage controller?

Lesson Quiz

1. After editing /etc/default/grub to change GRUB_TIMEOUT, you reboot but the timeout has not changed. What did you forget?

You need to restart the grub2.service systemd unit for changes to apply You need to run update-grub (or grub2-mkconfig) to regenerate grub.cfg from the template Changes to /etc/default/grub take two reboots to apply You edited the wrong file — timeout is set in /boot/grub/grub.cfg directly

2. A server running UEFI shows SecureBoot enabled. A custom-compiled kernel you built fails to load. What is the most likely reason?

The kernel file is too large for the EFI System Partition Secure Boot requires bootloaders and kernels to be signed by a trusted key — a custom kernel is not signed and is rejected by the firmware UEFI only supports kernels distributed by the manufacturer The custom kernel is missing the initramfs file which must exist in the same directory

3. A system drops to an emergency shell during boot with the message Failed to mount /data. What is the most likely cause and what is the first command you should run?

The systemd daemon has crashed — run systemctl restart systemd A broken fstab entry or missing/corrupt filesystem — run cat /etc/fstab to check the /data entry, then fsck if needed The kernel version is incompatible — boot a previous kernel from the GRUB menu The disk has been physically removed — run lsblk to verify hardware

Up Next

Lesson 25 — Linux Administration Best Practices

The habits, disciplines, and operational standards that define professional Linux administration at scale

← Previous Course Index Next →