Jenkins Lesson 34 – Distributed Builds | Dataplexa
Section IV · Lesson 34

Distributed Builds

One build agent is a single point of failure and a performance ceiling. Distributed builds spread your CI workload across multiple agents — so builds run faster, scale with demand, and survive individual machine failures without dropping a build.

This lesson covers

Why distributed builds matter → Adding agents via the UI → Agent labels for workload routing → Connecting SSH and JNLP agents → Provisioning agents with Docker → Managing agent capacity → The patterns teams use at scale

In Lessons 3 and 4 you understood the master/agent model conceptually. In Lesson 23 you saw Kubernetes pods used as ephemeral agents. This lesson brings it all together operationally — how to add static agents to a real Jenkins instance, how to configure them correctly, and how to think about agent capacity when you're running dozens of pipelines a day.

The Analogy

A single Jenkins agent is like a restaurant with one chef. At lunch rush, orders back up, every customer waits, and if the chef calls in sick the whole restaurant closes. Distributed builds are like having a kitchen brigade — multiple chefs, each handling their speciality. When one is busy, work routes to another. When the lunch rush hits, you add a temp. When service ends, the temp goes home.

Why Teams Move to Distributed Builds

Speed through parallelism

10 builds queued on 1 agent = 10 builds running on time on 5 agents. Parallel execution cuts developer wait time linearly with agent count — up to the point where the bottleneck shifts to the master or the network.

Redundancy and resilience

If you have 3 agents with the label linux and one goes offline, the other two keep building. No incident. No queue pile-up. Single agents are a single point of failure — always provision at least two per label in production.

Platform diversity

Some services need Linux. Some need Windows. Some need macOS for iOS builds. Labels route each job to the right platform without anyone hard-coding agent names in Jenkinsfiles.

Workload isolation

Dedicate one agent to production deploys only. Dedicate another to resource-heavy integration test suites. A runaway build on one agent can't starve deploys on another.

Adding a Static Agent via the Jenkins UI

The UI method is the right starting point for teams adding their first few static agents. The same configuration is available via the CLI and JCasC (Jenkins Configuration as Code) for teams at scale.

Jenkins — Manage Jenkins → Nodes → New Node
agent-linux-03
Linux build agent #3 — 8 CPU, 32GB RAM — eu-west-1
4
Set to (CPU count / 2) for build agents. More for test-only agents.
/var/jenkins-agent
linux docker eu-west

Space-separated. Jobs request these labels to target this agent. Add region labels to route specific jobs geographically.

Host 10.0.1.47
Credentials agent-linux-ssh-key ▾
Host Key Non verifying (accept first connect)

Adding and Managing Agents via the CLI

The scenario:

You're a platform engineer provisioning three new Linux build agents for a growing engineering team. Clicking through the UI for each agent is tedious and not reproducible. You need a scripted approach that you can commit to your infrastructure repository and rerun whenever agents are replaced or rebuilt.

Tools used:

  • create-node — Jenkins CLI command that creates a new agent from an XML configuration piped to stdin. The XML format matches what the UI generates — you can get a template by running get-node on an existing agent and editing it.
  • delete-node — removes an agent from Jenkins. Useful for decommissioning old agents cleanly rather than leaving them in the offline state.
  • offline-node / online-node — takes an agent offline for maintenance or brings it back online without deleting it. Use offline-node before patching an agent to prevent new builds from starting while you work on it.
  • disconnect-node — disconnects the agent process without removing the node config. The node remains registered in Jenkins but stops accepting builds until reconnected.
JENKINS_CLI="java -jar jenkins-cli.jar -s http://jenkins-master-01:8080 -auth admin:your-api-token"

# Create a new SSH agent from an XML definition
# The XML is piped directly to the create-node command
$JENKINS_CLI create-node agent-linux-03 << 'XML'

  agent-linux-03
  Linux build agent #3 — eu-west-1
  4
  NORMAL

  
  /var/jenkins-agent

  
  

  
  

  
  
    10.0.1.47
    22
    
    agent-linux-ssh-key
    60
    3
    15
  

XML

# Verify the node was created
$JENKINS_CLI get-node agent-linux-03 | grep -E "name|numExecutors|label|offline"

# Take an agent offline for maintenance — running builds continue, new builds queue
# The message is shown to anyone who looks at the agent in the UI
$JENKINS_CLI offline-node agent-linux-03 --message "Patching OS — back in 30 minutes"

# Bring it back online after maintenance
$JENKINS_CLI online-node agent-linux-03

# Decommission an old agent completely
$JENKINS_CLI delete-node agent-linux-01-old

Where to practice: Get a template for the XML by running get-node on an existing agent, redirecting to a file: $JENKINS_CLI get-node existing-agent > agent-template.xml. Edit the name, IP, and labels, then pipe it back with create-node new-agent-name < agent-template.xml. Full node management CLI reference at jenkins.io — Using agents.

# create-node output:
Node agent-linux-03 created successfully.

# get-node grep output:
  agent-linux-03
  4
  
  false

# offline-node output:
Node agent-linux-03 marked offline: Patching OS — back in 30 minutes
Running builds on agent-linux-03 will complete. New builds will queue.

# online-node output:
Node agent-linux-03 brought online.

# delete-node output:
Node agent-linux-01-old deleted.

What just happened?

  • Agent created from XML — the create-node command accepted the XML definition piped from the heredoc. The XML is the same format Jenkins uses internally in JENKINS_HOME/nodes/agent-linux-03/config.xml. This approach is reproducible — commit the XML to Git and rerun whenever you rebuild the agent fleet.
  • Three labels appliedlinux, docker, and eu-west. Jobs can request any combination. A job that specifies agent { label 'linux && eu-west' } will only run on agents that have both labels. This enables geographic routing for latency-sensitive builds.
  • offline-node is non-disruptive — running builds on the agent are allowed to finish. Only new builds are blocked from starting. This is the correct way to take an agent down for maintenance. Never kill the agent process directly mid-build.
  • Credential reference, not credential value — the XML contains credentialsId, not the actual SSH private key. The key stays in Jenkins' encrypted credential store. If you commit this XML to Git, no secrets are exposed.
  • delete-node removes the config — the agent's record in Jenkins is deleted. The actual machine is unaffected. JENKINS_HOME no longer stores any trace of the old agent.

Agent Capacity Planning

The right number of agents is determined by three things: your average build frequency, your average build duration, and how much queue time your team can tolerate. Here's how to think about it:

The back-of-napkin capacity formula

# Required executors =
builds_per_hour × avg_build_duration_hours × peak_factor

# Example: 40 builds/hr × 0.05 hrs (3 min avg) × 1.5 peak = 3 executors
# With 2 executors per agent → 2 agents minimum, 3 for redundancy

The peak factor (1.5–2.0) accounts for the burst at the start of a sprint or just before a release when everyone is pushing simultaneously. Size for the peak, not the average — a queue at peak hours frustrates developers more than idle agents at off-hours.

📏

Rule of thumb — executors per agent

Set executors to half the CPU count for build-heavy agents (compilation, Docker builds). Set them to equal the CPU count for test-heavy agents (mostly I/O and network). Never exceed the CPU count — you'll just create context-switching overhead.

🔄

Always maintain minimum two agents per label

If one goes down for maintenance or fails unexpectedly, builds can continue on the second. One agent per label is a single point of failure regardless of how robust the agent machine is.

📊

Use the Prometheus queue depth metric to right-size

From Lesson 32: default_jenkins_queue_size_value tells you how many builds are waiting right now. If this number is persistently above 3 during working hours, you need more agents. If executor utilisation is below 20%, you have too many.

Teacher's Note

The most common distributed builds mistake is having too few agents per label and too many labels. Start with linux, windows, and docker. Add specialised labels only when you have a genuine routing requirement — not just because it's possible.

Practice Questions

1. Which Jenkins CLI command creates a new agent by accepting an XML configuration from stdin?



2. Which CLI command takes an agent offline for maintenance while allowing running builds to complete, rather than abruptly killing the agent?



3. What label expression in a Jenkinsfile's agent block ensures a job only runs on agents that have both the linux and eu-west labels?



Quiz

1. What is the minimum safe number of agents per label in a production Jenkins environment?


2. What is the recommended number of executors for a build-heavy agent with 8 CPUs?


3. In the agent XML definition, what does <credentialsId> contain — and why is this the correct approach?


Up Next · Lesson 35

Scaling Jenkins

Your distributed build fleet is running — now learn how to scale Jenkins itself when build volume grows, teams multiply, and a single master becomes the bottleneck.