Linux Administration Lesson 24 – Linux Boot Process | Dataplexa
Section II — User, Process & Package Management

The Linux Boot Process

In this lesson

BIOS and UEFI GRUB bootloader Kernel and initramfs systemd init Boot troubleshooting

The Linux boot process is the sequence of events that transforms a powered-off machine into a fully running operating system. Each stage hands control to the next — firmware to bootloader, bootloader to kernel, kernel to init system. Understanding this sequence is essential when a system fails to boot, when you need to recover from a broken configuration, or when you want to understand why some services start before others.

The Full Boot Sequence

From power button to login prompt, a Linux system passes through six distinct stages. Each stage can fail independently, which is why understanding their boundaries makes boot problems straightforward to diagnose and isolate.

STAGE 1 Firmware BIOS or UEFI POST check hardware init find bootable device STAGE 2 Bootloader GRUB2 show menu load kernel load initramfs pass params STAGE 3 Kernel vmlinuz decompress detect hw load drivers mount / STAGE 4 initramfs early userspace load modules unlock crypto mount real / pivot_root STAGE 5 systemd PID 1 read units start services reach target multi-user STAGE 6 Login getty / sshd prompt user authenticate spawn shell ms 1–3s ~1s 1–2s 3–15s ready

Fig 1 — The six stages of the Linux boot process, with approximate timing

# View kernel boot messages from the current boot
dmesg | head -50
journalctl -k -b 0 | head -50

# Check how long the last boot took — systemd's boot timing analysis
systemd-analyze

# Show per-unit startup times — find what is slowing boot
systemd-analyze blame | head -20

# Visualise the boot as a dependency chain (generates an SVG)
systemd-analyze plot > /tmp/boot.svg

# Show critical path of the boot (the sequence that determined total boot time)
systemd-analyze critical-chain

What just happened? systemd-analyze broke boot time into its constituent stages — firmware took 2.8 seconds (slow UEFI or BIOS POST), and userspace took 8.4 seconds. blame identified that NetworkManager-wait-online at 4.2 seconds is the biggest userspace delay — this service waits for a network connection before continuing, and is often safely disabled on servers with predictable network interfaces.

BIOS and UEFI — Firmware Stage

The firmware is the first code that runs when a machine is powered on. It performs the POST (Power-On Self Test), initialises hardware, and locates the bootloader. Modern systems use UEFI (Unified Extensible Firmware Interface), which replaced the decades-old BIOS and brings significant improvements — including a dedicated EFI System Partition and Secure Boot.

BIOS — legacy

  • 16-bit code, runs from read-only chip
  • Reads the first 512 bytes of a disk (MBR) to find the bootloader
  • Disk limit: 2TB maximum with MBR
  • No Secure Boot — cannot verify bootloader integrity
  • Boot order controlled in BIOS setup screen
  • Still found on older servers and VMs

UEFI — modern

  • 32/64-bit code with a full driver model
  • Reads EFI System Partition (ESP) — a FAT32 /boot/efi partition
  • No disk size limit with GPT
  • Secure Boot — verifies bootloader signature
  • Boot entries stored in NVRAM — no boot disk required to manage boot order
  • Standard on all modern hardware since ~2012
# Check whether the system booted in UEFI or BIOS mode
[ -d /sys/firmware/efi ] && echo "UEFI mode" || echo "BIOS/Legacy mode"

# List UEFI boot entries stored in NVRAM
sudo efibootmgr -v

# Show the EFI System Partition contents
ls -la /boot/efi/EFI/

# Check if Secure Boot is enabled
sudo mokutil --sb-state

# View firmware variables exposed via sysfs
ls /sys/firmware/efi/vars/ | head -10

What just happened? efibootmgr listed three boot entries in NVRAM — Windows, Ubuntu, and PXE network boot — with Ubuntu as the current default (BootCurrent: 0001). The Ubuntu entry points to shimx64.efi — the Secure Boot shim that chains to GRUB. With Secure Boot enabled, any bootloader that is not signed by a trusted key will be refused by the firmware before it can even run.

GRUB2 — The Bootloader

GRUB2 (Grand Unified Bootloader version 2) is the bootloader used by virtually all major Linux distributions. It reads its configuration, displays the boot menu, loads the selected kernel image and initramfs into memory, passes kernel parameters, and transfers execution to the kernel. GRUB2 configuration is generated by the grub-mkconfig tool — you edit template files, then regenerate.

GRUB2 — Key Configuration Settings
Setting in /etc/default/grub Effect
GRUB_TIMEOUT=5 Seconds to display the boot menu before auto-booting the default entry. Set to -1 to wait indefinitely.
GRUB_DEFAULT=0 Which menu entry to boot by default — 0 = first entry. Can also be set to a menu title string.
GRUB_CMDLINE_LINUX Kernel parameters passed at every boot. Common additions: quiet splash, nomodeset, systemd.unit=rescue.target.
GRUB_CMDLINE_LINUX_DEFAULT Parameters added only for normal boots (not recovery mode). Remove quiet splash here to see verbose boot messages.
GRUB_DISABLE_RECOVERY Set to true to hide recovery mode entries from the boot menu (security hardening on production servers).
# View GRUB configuration defaults
cat /etc/default/grub

# Edit GRUB settings — always back up first
sudo cp /etc/default/grub /etc/default/grub.bak
sudo nano /etc/default/grub

# Regenerate GRUB config after editing /etc/default/grub
sudo update-grub                          # Debian/Ubuntu
sudo grub2-mkconfig -o /boot/grub2/grub.cfg  # RHEL/Rocky

# View the generated GRUB config (menu entries are here)
cat /boot/grub/grub.cfg | grep menuentry

# List available kernel versions installed
ls /boot/vmlinuz-*

# View current kernel parameters that were used at boot
cat /proc/cmdline

What just happened? /proc/cmdline showed the exact parameters the running kernel received from GRUB — including the root partition UUID and the quiet splash flags that suppress verbose boot messages. Three kernel versions are installed — the GRUB menu will offer all three, enabling you to boot an older kernel if the newest causes problems.

Kernel Initialisation and initramfs

Once GRUB loads the kernel (vmlinuz) and the initial RAM filesystem (initramfs) into memory, the kernel decompresses itself and begins hardware detection. The initramfs is a compressed cpio archive that contains a minimal root filesystem — just enough to mount the real root partition, including drivers for the storage controller, LVM, RAID, or disk encryption.

Analogy: The initramfs is like a rescue kit packed into the ambulance. Before the paramedic can reach the hospital (the real root filesystem), they need the basic equipment in that bag — oxygen, a defibrillator, bandages. The initramfs provides the minimum tools the kernel needs to reach the real root: storage drivers, LVM tools, decryption utilities. Once the real root is mounted, the rescue kit is discarded.

# List kernel and initramfs files in /boot
ls -lh /boot/

# View the contents of the initramfs (it is a compressed cpio archive)
lsinitramfs /boot/initrd.img-$(uname -r) | head -30   # Debian/Ubuntu
lsinitrd /boot/initramfs-$(uname -r).img | head -30    # RHEL/Rocky

# Rebuild the initramfs for the current kernel
sudo update-initramfs -u                               # Debian/Ubuntu
sudo dracut --force                                    # RHEL/Rocky

# Rebuild for all installed kernels
sudo update-initramfs -u -k all                        # Debian/Ubuntu

# Check current running kernel version
uname -r

# View kernel ring buffer — early boot messages
dmesg | grep -E "Linux version|Kernel|CPU|Memory" | head -10

What just happened? The initramfs at 78MB is significantly larger than the kernel at 11MB — this is normal. It contains a complete minimal userspace including udev, systemd-udevd, storage drivers, and filesystem tools. The vmlinuz symlink always points to the current default kernel — GRUB uses this symlink so that update-grub can automatically keep the default entry current after kernel upgrades.

Boot Troubleshooting

Most boot failures fall into one of four categories: GRUB cannot find the configuration, the kernel panics during hardware detection, the initramfs cannot mount the root filesystem, or systemd fails to start a critical service. Each has a distinct recovery approach.

GRUB rescue>
GRUB cannot find its config or partition

Causes: deleted /boot, changed partition UUID, corrupted GRUB install. Fix: boot from live media, chroot into the system, run grub-install and update-grub.

Kernel panic
Kernel cannot mount root or hits a fatal error

Often shows "VFS: Unable to mount root fs". Causes: wrong root UUID in kernel parameters, missing storage driver in initramfs, corrupted root filesystem. Fix: boot previous kernel, rebuild initramfs, or run fsck from live media.

Emergency shell
systemd drops to emergency.target

Typically caused by a broken /etc/fstab entry, a failed required service, or a corrupted filesystem. The emergency shell has root access — fix the fstab, run fsck, or disable the failing unit.

Service fails
systemd unit fails during normal startup

System boots but a service is in failed state. Check with systemctl list-units --state=failed and journalctl -u servicename -b 0.

# ── Rescue / Recovery boot techniques ────────────────────────────

# Edit kernel parameters temporarily at the GRUB menu:
# 1. At GRUB menu, press 'e' to edit the selected entry
# 2. Find the 'linux' line, navigate to the end
# 3. Add one of these recovery parameters:
#    systemd.unit=rescue.target     — minimal single-user shell
#    systemd.unit=emergency.target  — bare minimum (read-only root)
#    init=/bin/bash                 — bypass init entirely
#    rd.break                       — break into initramfs shell before pivot
# 4. Press Ctrl+X to boot with the modified parameters

# ── After reaching the emergency / rescue shell ───────────────────

# Remount root filesystem as read-write for repairs
mount -o remount,rw /

# Check and repair a filesystem (unmount or use live media first)
fsck -y /dev/sda3

# Fix a broken fstab
nano /etc/fstab

# Disable a failing service so the system can boot
systemctl disable problematic.service
# or mask it entirely:
systemctl mask problematic.service

# Regenerate GRUB from a chroot (after booting live media)
sudo mount /dev/sda3 /mnt
sudo mount /dev/sda2 /mnt/boot
sudo mount --bind /dev /mnt/dev
sudo mount --bind /proc /mnt/proc
sudo mount --bind /sys /mnt/sys
sudo chroot /mnt
grub-install /dev/sda
update-grub
exit

What just happened? The journal showed a cascade: a filesystem error on sda3 caused the /data mount to fail, which caused local-fs.target to fail, which caused PostgreSQL to fail — because it depends on /data being mounted. One filesystem error produced two failed units. Reading the journal from the earliest error upward reveals the actual root cause.

Boot Targets and Optimisation

Once the initramfs hands control to the real root filesystem, systemd (PID 1) begins activating units in dependency order to reach the default boot target. Understanding how to read and optimise this process reduces boot time and helps identify which services are truly needed at startup.

Identify slow units with systemd-analyze blame

Shows each unit's contribution to boot time in descending order. Focus on anything over 1 second — these are candidates for investigation.

systemd-analyze blame | head -10

Disable services not needed at boot

Services that are enabled but rarely needed (bluetooth, cups, ModemManager) add startup latency on servers. Disable services that are not required for your workload.

sudo systemctl disable bluetooth.service cups.service ModemManager.service

Handle NetworkManager-wait-online on servers

This service waits for a network connection before proceeding — often 4+ seconds. On cloud servers with predictable interfaces, it is usually safe to disable.

sudo systemctl disable NetworkManager-wait-online.service

Mask services that must never run

disable prevents auto-start but still allows manual start. mask creates a symlink to /dev/null — the service cannot be started by any means until unmasked.

sudo systemctl mask avahi-daemon.service

Never Edit /boot/grub/grub.cfg Directly — Always Use update-grub

The file /boot/grub/grub.cfg is generated automatically by update-grub (or grub2-mkconfig) — its first line even says "DO NOT EDIT THIS FILE". Any manual changes will be silently overwritten the next time a kernel is installed or updated. All permanent changes go into /etc/default/grub or a custom script in /etc/grub.d/, followed by regenerating the config. Editing grub.cfg directly is only appropriate for one-time emergency recovery from the GRUB command line.

Lesson Checklist

I can name all six boot stages in order and explain what each one does and what can fail at each boundary
I know the difference between BIOS/MBR and UEFI/GPT boot, and I can check which mode a running system used with [ -d /sys/firmware/efi ]
I edit /etc/default/grub for permanent changes and always run update-grub / grub2-mkconfig afterwards — never edit grub.cfg directly
I can add systemd.unit=rescue.target or rd.break to kernel parameters at the GRUB menu to enter recovery mode without physical access to the machine
I use systemd-analyze blame to identify slow boot services and journalctl -b 0 -p err to find errors from the current boot

Teacher's Note

The most practically valuable technique from this lesson is pressing e at the GRUB menu and appending systemd.unit=rescue.target to the kernel line. This gives you a root shell on any Linux system regardless of what is wrong with the running configuration — broken fstab, failed services, forgotten root password (with init=/bin/bash). Knowing this technique means you can recover from almost any boot failure without physical media.

Practice Questions

1. A server fails to boot and drops to a GRUB rescue prompt with the message error: no such partition. What has likely happened, and describe the complete recovery procedure using a live USB — including which commands you would run once chrooted into the broken system.

2. You want to reduce boot time on a cloud server. systemd-analyze blame shows NetworkManager-wait-online.service taking 4.1 seconds. Explain what this service does, why it can safely be disabled on most cloud servers, and write the command to disable it.

3. What is the initramfs, why does Linux need it, and what would happen during boot if the initramfs was missing the driver for the server's NVMe storage controller?

Lesson Quiz

1. After editing /etc/default/grub to change GRUB_TIMEOUT, you reboot but the timeout has not changed. What did you forget?

2. A server running UEFI shows SecureBoot enabled. A custom-compiled kernel you built fails to load. What is the most likely reason?

3. A system drops to an emergency shell during boot with the message Failed to mount /data. What is the most likely cause and what is the first command you should run?

Up Next

Lesson 25 — Linux Administration Best Practices

The habits, disciplines, and operational standards that define professional Linux administration at scale