Linux Administration
Network Configuration & Troubleshooting
In this lesson
Network configuration and troubleshooting goes beyond setting an IP address. Real-world server networking involves multiple redundant links, logical segmentation with VLANs, precise socket inspection, and packet-level diagnosis when standard tools cannot pinpoint the problem. This lesson builds the advanced toolkit that closes the gap between "the network is broken" and "the exact packet on the exact port is being dropped by this specific rule."
Advanced ip Command Usage
The ip command suite covers far more than addresses and routes. Its sub-commands for monitoring, policy routing, network namespaces, and statistics give administrators precise, real-time visibility into every aspect of the network stack.
| Sub-command | Purpose |
|---|---|
ip addr |
View and manage IP addresses on interfaces. |
ip link |
Manage network interfaces — rename, set MTU, bring up/down, create virtual devices. |
ip route |
View and manipulate the kernel routing table. Supports multiple routing tables via table parameter. |
ip neigh |
ARP/NDP neighbour table — IP-to-MAC mappings. Add, delete, or flush entries. |
ip rule |
Policy routing rules — route traffic based on source IP, tos, or fwmark rather than just destination. |
ip netns |
Network namespaces — isolated network stacks. The foundation of container networking. |
ip -s link |
Show per-interface statistics — RX/TX packets, bytes, errors, dropped, overruns. |
# Monitor network events in real time — interface state changes, address changes
ip monitor all
# Watch interface statistics refresh every second
watch -n 1 'ip -s link show eth0'
# Manage the ARP cache — useful when a remote host has changed its MAC
ip neigh show
ip neigh flush dev eth0 # flush all ARP entries for eth0
sudo ip neigh add 192.168.1.50 lladdr 52:54:00:ab:cd:ef dev eth0 # add static entry
# Change interface MTU (e.g. for jumbo frames on a 10GbE network)
sudo ip link set eth0 mtu 9000
# Rename an interface
sudo ip link set eth0 name wan0
# Create a dummy interface for testing
sudo ip link add dummy0 type dummy
sudo ip addr add 10.99.0.1/24 dev dummy0
sudo ip link set dummy0 up# ip -s link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP
RX: bytes packets errors dropped overrun mcast
14285312 98421 0 0 0 0
TX: bytes packets errors dropped carrier collsn
3932160 27841 0 0 0 0
# ip neigh show
192.168.1.1 dev eth0 lladdr 52:54:00:11:22:33 REACHABLE
192.168.1.20 dev eth0 lladdr 52:54:00:44:55:66 STALE
What just happened? ip -s link showed zero errors and zero dropped packets on eth0 — a healthy interface. The ARP table shows the gateway (192.168.1.1) in REACHABLE state, meaning it has recently been confirmed active. The STALE entry for 192.168.1.20 means it has not been confirmed recently but the MAC is still cached — it will be probed on next use. Persistent errors or dropped packets in the statistics are the first indicators of a hardware or driver problem.
Bonding and VLANs
Production servers often need more than a single network interface. Bonding (also called teaming or link aggregation) combines multiple physical interfaces into one logical interface for redundancy or increased throughput. VLANs (Virtual LANs) segment a single physical link into multiple logical networks — the switch tags traffic with an 802.1Q VLAN ID, and Linux creates virtual sub-interfaces to handle each tag.
Network Bonding — Link Aggregation
Multiple physical NICs appear as one logical interface. Common modes: active-backup (failover), balance-rr (round-robin), 802.3ad (LACP — requires switch support).
# Netplan bond example
bonds:
bond0:
interfaces: [eth0, eth1]
parameters:
mode: active-backup
primary: eth0
VLANs — Logical Segmentation
One physical interface carries traffic for multiple networks. Each VLAN creates a sub-interface named eth0.10 (interface.vlan-id). Requires a trunk port on the connected switch.
# Netplan VLAN example
vlans:
eth0.10:
id: 10
link: eth0
addresses: [10.10.0.5/24]
eth0.20:
id: 20
link: eth0
addresses: [10.20.0.5/24]
# ── Create a bond manually (temporary) ───────────────────────────
# Load the bonding kernel module
sudo modprobe bonding
# Create the bond interface
sudo ip link add bond0 type bond mode active-backup
# Add physical interfaces as bond members (take them down first)
sudo ip link set eth0 down
sudo ip link set eth1 down
sudo ip link set eth0 master bond0
sudo ip link set eth1 master bond0
# Bring the bond up and assign an IP
sudo ip link set bond0 up
sudo ip addr add 192.168.1.10/24 dev bond0
# Check bond status
cat /proc/net/bonding/bond0
# ── Create a VLAN sub-interface manually (temporary) ─────────────
sudo ip link add link eth0 name eth0.10 type vlan id 10
sudo ip addr add 10.10.0.5/24 dev eth0.10
sudo ip link set eth0.10 up
# View VLAN interfaces
ip link show type vlan# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v6.5.0 Bonding Mode: fault-tolerance (active-backup) Primary Slave: eth0 (primary_reselect failure) Currently Active Slave: eth0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: up Speed: 1000 Mbps Duplex: full Slave Interface: eth1 MII Status: up Speed: 1000 Mbps Duplex: full
What just happened? /proc/net/bonding/bond0 confirmed the bond is healthy — both slaves are up at 1000Mbps full-duplex, and eth0 is the active slave. In active-backup mode, all traffic goes through eth0; if eth0 fails, the kernel automatically switches to eth1 within the MII polling interval of 100ms — transparent to any application using the bond0 interface.
Deep Socket Inspection with ss
ss (socket statistics) is the modern replacement for netstat. It reads directly from the kernel's socket structures rather than parsing /proc files, making it faster and more accurate — particularly important when a system has thousands of connections. Its filter syntax allows very precise queries that netstat cannot match.
# All listening TCP sockets with process info
sudo ss -tlnp
# All established TCP connections
sudo ss -tnp state established
# All sockets — TCP + UDP + Unix
sudo ss -anp
# Filter by destination port — who is connected to port 443?
sudo ss -tnp dst :443
# Filter by source port — what are our outgoing connections on 5432?
sudo ss -tnp src :5432
# Filter by state — all sockets in TIME-WAIT
ss -tan state time-wait
# Count sockets per state — detect connection storms
ss -tan | awk 'NR>1 {state[$1]++} END {for(s in state) print s, state[s]}' | sort -k2 -rn
# Show socket memory usage — helpful for diagnosing buffer exhaustion
ss -tm
# Show detailed TCP internals — retransmits, RTT, congestion window
sudo ss -ti dst 8.8.8.8# ss -tan | awk 'NR>1 {state[$1]++} END {for(s in state) print s, state[s]}' | sort -k2 -rn
ESTABLISHED 412
TIME-WAIT 18
LISTEN 4
CLOSE-WAIT 3
# sudo ss -ti dst 8.8.8.8
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.1.10:52341 8.8.8.8:443
cubic wscale:7,7 rto:204 rtt:3.218/1.432 ato:40
mss:1460 pmtu:1500 rcvmss:1460 advmss:1460
cwnd:10 bytes_sent:2184 bytes_acked:2184 bytes_received:4096
retrans:0/0 dsack_dups:0 reord_seen:0
What just happened? The TCP internals view from ss -ti revealed the connection's round-trip time (3.2ms), congestion window (10 segments), and — critically — zero retransmits. Non-zero retransmits indicate packet loss on the path to that destination. This level of per-connection detail was previously only available through packet capture tools like tcpdump.
Packet Capture with tcpdump
When higher-level tools cannot identify a problem, tcpdump captures actual network packets — the ground truth of what is happening on the wire. It is the definitive tool for confirming whether traffic is arriving, what its content looks like, and whether firewall rules or NAT are transforming it unexpectedly.
Capture all traffic to or from a specific host.
host 192.168.1.50
src host 10.0.0.5
dst host 8.8.8.8
Capture traffic on a specific port regardless of direction.
port 443
port 80 or port 443
portrange 8000-9000
Capture only specific protocol traffic.
tcp
udp
icmp
arp
Use and, or, not for compound filters.
host 10.0.0.5 and port 443
tcp and not port 22
# Capture on eth0 — show packet summaries (no DNS resolution, more verbose)
sudo tcpdump -i eth0 -n
# Capture only HTTPS traffic to/from a specific host
sudo tcpdump -i eth0 -n host 192.168.1.50 and port 443
# Capture all ICMP (ping) traffic to debug reachability
sudo tcpdump -i eth0 -n icmp
# Capture and save to a file for analysis in Wireshark
sudo tcpdump -i eth0 -n -w /tmp/capture.pcap
# Replay or read a saved capture file
sudo tcpdump -r /tmp/capture.pcap -n
# Print packet payload in ASCII — see HTTP requests/responses
sudo tcpdump -i eth0 -n -A port 80
# Print first 100 bytes of payload in hex + ASCII
sudo tcpdump -i eth0 -n -X -s 100 port 8080
# Capture only SYN packets — detect connection attempts (useful for DDoS detection)
sudo tcpdump -i eth0 -n 'tcp[tcpflags] & tcp-syn != 0 and tcp[tcpflags] & tcp-ack == 0'# sudo tcpdump -i eth0 -n host 192.168.1.50 and port 443 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes 14:32:01.123456 IP 192.168.1.10.52341 > 192.168.1.50.443: Flags [S], seq 123456789, win 64240 14:32:01.124100 IP 192.168.1.50.443 > 192.168.1.10.52341: Flags [S.], seq 987654321, ack 123456790 14:32:01.124210 IP 192.168.1.10.52341 > 192.168.1.50.443: Flags [.], ack 987654322, win 502 14:32:01.128000 IP 192.168.1.10.52341 > 192.168.1.50.443: Flags [P.], length 517 14:32:01.140000 IP 192.168.1.50.443 > 192.168.1.10.52341: Flags [P.], length 1448
What just happened? The capture showed a complete TCP three-way handshake: [S] SYN from client, [S.] SYN-ACK from server, [.] ACK completing the handshake. Then [P.] PSH-ACK data transfers in both directions — confirming a successfully established TLS connection. If the server had not replied to the SYN, we would know the problem is at the server (firewall dropping, service not listening), not the client.
Systematic Network Troubleshooting Methodology
Network problems in production almost always fall into one of five root cause categories. A disciplined methodology that maps symptoms to categories — and then applies the right tool for each — resolves the majority of incidents without guesswork.
Fig 1 — Network troubleshooting decision tree: each question isolates one layer
# Step 1 — gateway reachable?
ping -c 3 $(ip route | grep default | awk '{print $3}')
# Step 2 — internet IP reachable?
ping -c 3 8.8.8.8
# Step 3 — DNS resolution working?
dig +short google.com
# or bypass system resolver to isolate:
dig @8.8.8.8 +short google.com
# Step 4 — target port open?
nc -zv target.example.com 443
# or:
timeout 3 bash -c 'cat < /dev/null > /dev/tcp/target.example.com/443' \
&& echo "port open" || echo "port closed/filtered"
# Step 5 — if port appears closed, confirm with tcpdump whether packets arrive
sudo tcpdump -i eth0 -n host target.example.com and port 443
# Check iptables rules if packets arrive but port appears closed
sudo iptables -L -n -v | grep -E "443|REJECT|DROP"Common Network Issues and Their Signatures
Experienced administrators recognise network problems by their characteristic patterns in tool output. The following signatures appear repeatedly in real-world troubleshooting and are worth memorising as diagnostic patterns.
Normal for busy HTTP servers — each closed connection enters TIME-WAIT for 60 seconds. Becomes a problem when local port exhaustion occurs. Mitigate with net.ipv4.tcp_tw_reuse=1 in sysctl and keep-alive connections.
CLOSE-WAIT means the remote side closed the connection but the local application has not called close() on its socket. A growing count means the application has a socket leak. Restart the service and file a bug.
TCP retransmits confirm packets are being dropped between source and destination. Run mtr target.example.com to identify which hop is losing packets.
If tcpdump on the server shows SYN arriving but no SYN-ACK going out, either the port is not listening or the local firewall (iptables/nftables) is dropping the packet before it reaches the application.
The kernel receive buffer is full before the application can read the data — either the application is too slow, the ring buffer is undersized, or interrupt affinity needs tuning. Check with ethtool -S eth0.
tcpdump on a Busy Interface Can Itself Cause Packet Drops
Running tcpdump without filters on a high-traffic interface (10Gbps+) can consume significant CPU and create a feedback loop — the capture tool competes for the same CPU as the network stack, potentially increasing the very packet drops you are trying to diagnose. Always apply the most specific filter expression possible (host X and port Y), use -s to limit snapshot length, and prefer writing to a file (-w) over printing to the terminal on busy production servers.
Lesson Checklist
ip -s link and ip neigh to inspect interface statistics and ARP cache entries, and I know what error/dropped counters indicate
ip link
ss -ti to inspect TCP internals including retransmits and RTT, and I can count connections per state to detect anomalies
tcpdump filter expressions and interpret TCP flag sequences to confirm whether a connection is completing its handshake
Teacher's Note
The most powerful single troubleshooting technique in this lesson is running tcpdump simultaneously on both ends of a failing connection. If packets appear on the sender's tcpdump but not on the receiver's, the packet is being dropped between them — by a firewall, a router, or the kernel's own netfilter rules. If packets appear on both ends but the connection still fails, the problem is in the application or TLS layer. This two-sided capture technique resolves an entire class of "it's the network" vs "it's the app" disputes in minutes.
Practice Questions
1. An application server reports it cannot connect to the database at db.internal:5432. The application logs show "Connection refused". Walk through the exact diagnostic commands you would run on both the application server and the database server, explaining what each command's output would tell you.
dig db.internal — confirms DNS resolves correctly. nc -zv db.internal 5432 — tests TCP connectivity to the port directly; "Connection refused" confirms the port is reachable but nothing is accepting. On the DB server: ss -tlnp | grep 5432 — checks if PostgreSQL is actually listening (and on which address: 127.0.0.1 vs 0.0.0.0). sudo systemctl status postgresql — confirms the service is running. sudo ufw status or sudo iptables -L — checks if a firewall is blocking port 5432 from the app server's IP.
2. ss -tan | awk shows 4,200 CLOSE-WAIT connections on a web server. Explain what CLOSE-WAIT means in the TCP state machine, why accumulating CLOSE-WAIT connections is a problem, and what is the likely root cause.
3. You need to capture all TCP SYN packets arriving on eth0 destined for port 443 and save them to a file for offline analysis. Write the exact tcpdump command, including appropriate flags for a busy production server, and explain the purpose of each flag.
sudo tcpdump -i eth0 -nn -s 96 -w /tmp/syn443.pcap 'tcp dst port 443 and tcp[tcpflags] & tcp-syn != 0' — -i eth0: listen on eth0. -nn: no DNS/port name resolution (critical on busy servers to avoid slowdown). -s 96: capture only the first 96 bytes (headers only, reduces file size). -w /tmp/syn443.pcap: write raw packets to file for Wireshark analysis. The filter matches TCP packets destined for port 443 with the SYN flag set.
Lesson Quiz
1. A server has two physical NICs configured as a bond in active-backup mode. What happens to the server's network connectivity when the switch port connected to eth0 goes down?
2. A tcpdump capture shows incoming SYN packets on port 8080 but no SYN-ACK responses. The application is confirmed running. What is the most likely cause?
3. Which ss command shows TCP internals — including retransmit count and round-trip time — for the connection to 10.0.0.50?
Up Next
Lesson 28 — SSH and Secure Access
SSH key management, tunnelling, port forwarding, and hardening SSH for production use