Linux Administration Lesson 27 – | Dataplexa
Section III — Networking, Security & Storage

Network Configuration & Troubleshooting

In this lesson

Advanced ip commands Bonding and VLANs tcpdump ss in depth Systematic troubleshooting

Network configuration and troubleshooting goes beyond setting an IP address. Real-world server networking involves multiple redundant links, logical segmentation with VLANs, precise socket inspection, and packet-level diagnosis when standard tools cannot pinpoint the problem. This lesson builds the advanced toolkit that closes the gap between "the network is broken" and "the exact packet on the exact port is being dropped by this specific rule."

Advanced ip Command Usage

The ip command suite covers far more than addresses and routes. Its sub-commands for monitoring, policy routing, network namespaces, and statistics give administrators precise, real-time visibility into every aspect of the network stack.

ip — Sub-Command Reference
Sub-command Purpose
ip addr View and manage IP addresses on interfaces.
ip link Manage network interfaces — rename, set MTU, bring up/down, create virtual devices.
ip route View and manipulate the kernel routing table. Supports multiple routing tables via table parameter.
ip neigh ARP/NDP neighbour table — IP-to-MAC mappings. Add, delete, or flush entries.
ip rule Policy routing rules — route traffic based on source IP, tos, or fwmark rather than just destination.
ip netns Network namespaces — isolated network stacks. The foundation of container networking.
ip -s link Show per-interface statistics — RX/TX packets, bytes, errors, dropped, overruns.
# Monitor network events in real time — interface state changes, address changes
ip monitor all

# Watch interface statistics refresh every second
watch -n 1 'ip -s link show eth0'

# Manage the ARP cache — useful when a remote host has changed its MAC
ip neigh show
ip neigh flush dev eth0          # flush all ARP entries for eth0
sudo ip neigh add 192.168.1.50 lladdr 52:54:00:ab:cd:ef dev eth0  # add static entry

# Change interface MTU (e.g. for jumbo frames on a 10GbE network)
sudo ip link set eth0 mtu 9000

# Rename an interface
sudo ip link set eth0 name wan0

# Create a dummy interface for testing
sudo ip link add dummy0 type dummy
sudo ip addr add 10.99.0.1/24 dev dummy0
sudo ip link set dummy0 up

What just happened? ip -s link showed zero errors and zero dropped packets on eth0 — a healthy interface. The ARP table shows the gateway (192.168.1.1) in REACHABLE state, meaning it has recently been confirmed active. The STALE entry for 192.168.1.20 means it has not been confirmed recently but the MAC is still cached — it will be probed on next use. Persistent errors or dropped packets in the statistics are the first indicators of a hardware or driver problem.

Bonding and VLANs

Production servers often need more than a single network interface. Bonding (also called teaming or link aggregation) combines multiple physical interfaces into one logical interface for redundancy or increased throughput. VLANs (Virtual LANs) segment a single physical link into multiple logical networks — the switch tags traffic with an 802.1Q VLAN ID, and Linux creates virtual sub-interfaces to handle each tag.

Network Bonding — Link Aggregation

Multiple physical NICs appear as one logical interface. Common modes: active-backup (failover), balance-rr (round-robin), 802.3ad (LACP — requires switch support).

# Netplan bond example bonds: bond0: interfaces: [eth0, eth1] parameters: mode: active-backup primary: eth0

VLANs — Logical Segmentation

One physical interface carries traffic for multiple networks. Each VLAN creates a sub-interface named eth0.10 (interface.vlan-id). Requires a trunk port on the connected switch.

# Netplan VLAN example vlans: eth0.10: id: 10 link: eth0 addresses: [10.10.0.5/24] eth0.20: id: 20 link: eth0 addresses: [10.20.0.5/24]
# ── Create a bond manually (temporary) ───────────────────────────

# Load the bonding kernel module
sudo modprobe bonding

# Create the bond interface
sudo ip link add bond0 type bond mode active-backup

# Add physical interfaces as bond members (take them down first)
sudo ip link set eth0 down
sudo ip link set eth1 down
sudo ip link set eth0 master bond0
sudo ip link set eth1 master bond0

# Bring the bond up and assign an IP
sudo ip link set bond0 up
sudo ip addr add 192.168.1.10/24 dev bond0

# Check bond status
cat /proc/net/bonding/bond0

# ── Create a VLAN sub-interface manually (temporary) ─────────────

sudo ip link add link eth0 name eth0.10 type vlan id 10
sudo ip addr add 10.10.0.5/24 dev eth0.10
sudo ip link set eth0.10 up

# View VLAN interfaces
ip link show type vlan

What just happened? /proc/net/bonding/bond0 confirmed the bond is healthy — both slaves are up at 1000Mbps full-duplex, and eth0 is the active slave. In active-backup mode, all traffic goes through eth0; if eth0 fails, the kernel automatically switches to eth1 within the MII polling interval of 100ms — transparent to any application using the bond0 interface.

Deep Socket Inspection with ss

ss (socket statistics) is the modern replacement for netstat. It reads directly from the kernel's socket structures rather than parsing /proc files, making it faster and more accurate — particularly important when a system has thousands of connections. Its filter syntax allows very precise queries that netstat cannot match.

# All listening TCP sockets with process info
sudo ss -tlnp

# All established TCP connections
sudo ss -tnp state established

# All sockets — TCP + UDP + Unix
sudo ss -anp

# Filter by destination port — who is connected to port 443?
sudo ss -tnp dst :443

# Filter by source port — what are our outgoing connections on 5432?
sudo ss -tnp src :5432

# Filter by state — all sockets in TIME-WAIT
ss -tan state time-wait

# Count sockets per state — detect connection storms
ss -tan | awk 'NR>1 {state[$1]++} END {for(s in state) print s, state[s]}' | sort -k2 -rn

# Show socket memory usage — helpful for diagnosing buffer exhaustion
ss -tm

# Show detailed TCP internals — retransmits, RTT, congestion window
sudo ss -ti dst 8.8.8.8

What just happened? The TCP internals view from ss -ti revealed the connection's round-trip time (3.2ms), congestion window (10 segments), and — critically — zero retransmits. Non-zero retransmits indicate packet loss on the path to that destination. This level of per-connection detail was previously only available through packet capture tools like tcpdump.

Packet Capture with tcpdump

When higher-level tools cannot identify a problem, tcpdump captures actual network packets — the ground truth of what is happening on the wire. It is the definitive tool for confirming whether traffic is arriving, what its content looks like, and whether firewall rules or NAT are transforming it unexpectedly.

Host filter

Capture all traffic to or from a specific host.

host 192.168.1.50 src host 10.0.0.5 dst host 8.8.8.8
Port filter

Capture traffic on a specific port regardless of direction.

port 443 port 80 or port 443 portrange 8000-9000
Protocol filter

Capture only specific protocol traffic.

tcp udp icmp arp
Combined

Use and, or, not for compound filters.

host 10.0.0.5 and port 443 tcp and not port 22
# Capture on eth0 — show packet summaries (no DNS resolution, more verbose)
sudo tcpdump -i eth0 -n

# Capture only HTTPS traffic to/from a specific host
sudo tcpdump -i eth0 -n host 192.168.1.50 and port 443

# Capture all ICMP (ping) traffic to debug reachability
sudo tcpdump -i eth0 -n icmp

# Capture and save to a file for analysis in Wireshark
sudo tcpdump -i eth0 -n -w /tmp/capture.pcap

# Replay or read a saved capture file
sudo tcpdump -r /tmp/capture.pcap -n

# Print packet payload in ASCII — see HTTP requests/responses
sudo tcpdump -i eth0 -n -A port 80

# Print first 100 bytes of payload in hex + ASCII
sudo tcpdump -i eth0 -n -X -s 100 port 8080

# Capture only SYN packets — detect connection attempts (useful for DDoS detection)
sudo tcpdump -i eth0 -n 'tcp[tcpflags] & tcp-syn != 0 and tcp[tcpflags] & tcp-ack == 0'

What just happened? The capture showed a complete TCP three-way handshake: [S] SYN from client, [S.] SYN-ACK from server, [.] ACK completing the handshake. Then [P.] PSH-ACK data transfers in both directions — confirming a successfully established TLS connection. If the server had not replied to the SYN, we would know the problem is at the server (firewall dropping, service not listening), not the client.

Systematic Network Troubleshooting Methodology

Network problems in production almost always fall into one of five root cause categories. A disciplined methodology that maps symptoms to categories — and then applies the right tool for each — resolves the majority of incidents without guesswork.

Cannot reach destination Can you ping the gateway? NO Layer 1/2/3 issue ip link, ip addr, cable YES Can you ping 8.8.8.8 (IP)? NO Routing / NAT issue ip route, traceroute YES Does DNS resolve the name? NO DNS issue resolvectl, dig @8.8.8.8 YES Is the TCP port open (nc -zv)? NO Firewall / service down ss, iptables, tcpdump Application-layer issue — check app logs

Fig 1 — Network troubleshooting decision tree: each question isolates one layer

# Step 1 — gateway reachable?
ping -c 3 $(ip route | grep default | awk '{print $3}')

# Step 2 — internet IP reachable?
ping -c 3 8.8.8.8

# Step 3 — DNS resolution working?
dig +short google.com
# or bypass system resolver to isolate:
dig @8.8.8.8 +short google.com

# Step 4 — target port open?
nc -zv target.example.com 443
# or:
timeout 3 bash -c 'cat < /dev/null > /dev/tcp/target.example.com/443' \
  && echo "port open" || echo "port closed/filtered"

# Step 5 — if port appears closed, confirm with tcpdump whether packets arrive
sudo tcpdump -i eth0 -n host target.example.com and port 443

# Check iptables rules if packets arrive but port appears closed
sudo iptables -L -n -v | grep -E "443|REJECT|DROP"

Common Network Issues and Their Signatures

Experienced administrators recognise network problems by their characteristic patterns in tool output. The following signatures appear repeatedly in real-world troubleshooting and are worth memorising as diagnostic patterns.

High TIME-WAIT
Many thousands of TIME-WAIT sockets

Normal for busy HTTP servers — each closed connection enters TIME-WAIT for 60 seconds. Becomes a problem when local port exhaustion occurs. Mitigate with net.ipv4.tcp_tw_reuse=1 in sysctl and keep-alive connections.

Growing CLOSE-WAIT
Accumulating CLOSE-WAIT connections — application bug

CLOSE-WAIT means the remote side closed the connection but the local application has not called close() on its socket. A growing count means the application has a socket leak. Restart the service and file a bug.

Retransmits in ss -ti
Non-zero retransmit count — packet loss on the path

TCP retransmits confirm packets are being dropped between source and destination. Run mtr target.example.com to identify which hop is losing packets.

SYN no reply in tcpdump
SYN sent, no SYN-ACK received — firewall or service down

If tcpdump on the server shows SYN arriving but no SYN-ACK going out, either the port is not listening or the local firewall (iptables/nftables) is dropping the packet before it reaches the application.

Dropped in ip -s link
Rising dropped packets on an interface — buffer overflow

The kernel receive buffer is full before the application can read the data — either the application is too slow, the ring buffer is undersized, or interrupt affinity needs tuning. Check with ethtool -S eth0.

tcpdump on a Busy Interface Can Itself Cause Packet Drops

Running tcpdump without filters on a high-traffic interface (10Gbps+) can consume significant CPU and create a feedback loop — the capture tool competes for the same CPU as the network stack, potentially increasing the very packet drops you are trying to diagnose. Always apply the most specific filter expression possible (host X and port Y), use -s to limit snapshot length, and prefer writing to a file (-w) over printing to the terminal on busy production servers.

Lesson Checklist

I can use ip -s link and ip neigh to inspect interface statistics and ARP cache entries, and I know what error/dropped counters indicate
I understand the difference between bonding (link redundancy/aggregation) and VLANs (logical segmentation), and can configure both using Netplan or ip link
I use ss -ti to inspect TCP internals including retransmits and RTT, and I can count connections per state to detect anomalies
I can write targeted tcpdump filter expressions and interpret TCP flag sequences to confirm whether a connection is completing its handshake
I follow the four-question decision tree (gateway → public IP → DNS → port) to isolate network failures to a specific layer before applying targeted fixes

Teacher's Note

The most powerful single troubleshooting technique in this lesson is running tcpdump simultaneously on both ends of a failing connection. If packets appear on the sender's tcpdump but not on the receiver's, the packet is being dropped between them — by a firewall, a router, or the kernel's own netfilter rules. If packets appear on both ends but the connection still fails, the problem is in the application or TLS layer. This two-sided capture technique resolves an entire class of "it's the network" vs "it's the app" disputes in minutes.

Practice Questions

1. An application server reports it cannot connect to the database at db.internal:5432. The application logs show "Connection refused". Walk through the exact diagnostic commands you would run on both the application server and the database server, explaining what each command's output would tell you.

2. ss -tan | awk shows 4,200 CLOSE-WAIT connections on a web server. Explain what CLOSE-WAIT means in the TCP state machine, why accumulating CLOSE-WAIT connections is a problem, and what is the likely root cause.

3. You need to capture all TCP SYN packets arriving on eth0 destined for port 443 and save them to a file for offline analysis. Write the exact tcpdump command, including appropriate flags for a busy production server, and explain the purpose of each flag.

Lesson Quiz

1. A server has two physical NICs configured as a bond in active-backup mode. What happens to the server's network connectivity when the switch port connected to eth0 goes down?

2. A tcpdump capture shows incoming SYN packets on port 8080 but no SYN-ACK responses. The application is confirmed running. What is the most likely cause?

3. Which ss command shows TCP internals — including retransmit count and round-trip time — for the connection to 10.0.0.50?

Up Next

Lesson 28 — SSH and Secure Access

SSH key management, tunnelling, port forwarding, and hardening SSH for production use