Jenkins Course
Distributed Builds
One build agent is a single point of failure and a performance ceiling. Distributed builds spread your CI workload across multiple agents — so builds run faster, scale with demand, and survive individual machine failures without dropping a build.
This lesson covers
Why distributed builds matter → Adding agents via the UI → Agent labels for workload routing → Connecting SSH and JNLP agents → Provisioning agents with Docker → Managing agent capacity → The patterns teams use at scale
In Lessons 3 and 4 you understood the master/agent model conceptually. In Lesson 23 you saw Kubernetes pods used as ephemeral agents. This lesson brings it all together operationally — how to add static agents to a real Jenkins instance, how to configure them correctly, and how to think about agent capacity when you're running dozens of pipelines a day.
The Analogy
A single Jenkins agent is like a restaurant with one chef. At lunch rush, orders back up, every customer waits, and if the chef calls in sick the whole restaurant closes. Distributed builds are like having a kitchen brigade — multiple chefs, each handling their speciality. When one is busy, work routes to another. When the lunch rush hits, you add a temp. When service ends, the temp goes home.
Why Teams Move to Distributed Builds
Speed through parallelism
10 builds queued on 1 agent = 10 builds running on time on 5 agents. Parallel execution cuts developer wait time linearly with agent count — up to the point where the bottleneck shifts to the master or the network.
Redundancy and resilience
If you have 3 agents with the label linux and one goes offline, the other two keep building. No incident. No queue pile-up. Single agents are a single point of failure — always provision at least two per label in production.
Platform diversity
Some services need Linux. Some need Windows. Some need macOS for iOS builds. Labels route each job to the right platform without anyone hard-coding agent names in Jenkinsfiles.
Workload isolation
Dedicate one agent to production deploys only. Dedicate another to resource-heavy integration test suites. A runaway build on one agent can't starve deploys on another.
Adding a Static Agent via the Jenkins UI
The UI method is the right starting point for teams adding their first few static agents. The same configuration is available via the CLI and JCasC (Jenkins Configuration as Code) for teams at scale.
Space-separated. Jobs request these labels to target this agent. Add region labels to route specific jobs geographically.
Adding and Managing Agents via the CLI
The scenario:
You're a platform engineer provisioning three new Linux build agents for a growing engineering team. Clicking through the UI for each agent is tedious and not reproducible. You need a scripted approach that you can commit to your infrastructure repository and rerun whenever agents are replaced or rebuilt.
Tools used:
- create-node — Jenkins CLI command that creates a new agent from an XML configuration piped to stdin. The XML format matches what the UI generates — you can get a template by running
get-nodeon an existing agent and editing it. - delete-node — removes an agent from Jenkins. Useful for decommissioning old agents cleanly rather than leaving them in the offline state.
- offline-node / online-node — takes an agent offline for maintenance or brings it back online without deleting it. Use offline-node before patching an agent to prevent new builds from starting while you work on it.
- disconnect-node — disconnects the agent process without removing the node config. The node remains registered in Jenkins but stops accepting builds until reconnected.
JENKINS_CLI="java -jar jenkins-cli.jar -s http://jenkins-master-01:8080 -auth admin:your-api-token"
# Create a new SSH agent from an XML definition
# The XML is piped directly to the create-node command
$JENKINS_CLI create-node agent-linux-03 << 'XML'
agent-linux-03
Linux build agent #3 — eu-west-1
4
NORMAL
/var/jenkins-agent
10.0.1.47
22
agent-linux-ssh-key
60
3
15
XML
# Verify the node was created
$JENKINS_CLI get-node agent-linux-03 | grep -E "name|numExecutors|label|offline"
# Take an agent offline for maintenance — running builds continue, new builds queue
# The message is shown to anyone who looks at the agent in the UI
$JENKINS_CLI offline-node agent-linux-03 --message "Patching OS — back in 30 minutes"
# Bring it back online after maintenance
$JENKINS_CLI online-node agent-linux-03
# Decommission an old agent completely
$JENKINS_CLI delete-node agent-linux-01-old
Where to practice: Get a template for the XML by running get-node on an existing agent, redirecting to a file: $JENKINS_CLI get-node existing-agent > agent-template.xml. Edit the name, IP, and labels, then pipe it back with create-node new-agent-name < agent-template.xml. Full node management CLI reference at jenkins.io — Using agents.
# create-node output: Node agent-linux-03 created successfully. # get-node grep output:agent-linux-03 4 false # offline-node output: Node agent-linux-03 marked offline: Patching OS — back in 30 minutes Running builds on agent-linux-03 will complete. New builds will queue. # online-node output: Node agent-linux-03 brought online. # delete-node output: Node agent-linux-01-old deleted.
What just happened?
- Agent created from XML — the
create-nodecommand accepted the XML definition piped from the heredoc. The XML is the same format Jenkins uses internally inJENKINS_HOME/nodes/agent-linux-03/config.xml. This approach is reproducible — commit the XML to Git and rerun whenever you rebuild the agent fleet. - Three labels applied —
linux,docker, andeu-west. Jobs can request any combination. A job that specifiesagent { label 'linux && eu-west' }will only run on agents that have both labels. This enables geographic routing for latency-sensitive builds. - offline-node is non-disruptive — running builds on the agent are allowed to finish. Only new builds are blocked from starting. This is the correct way to take an agent down for maintenance. Never kill the agent process directly mid-build.
- Credential reference, not credential value — the XML contains
credentialsId, not the actual SSH private key. The key stays in Jenkins' encrypted credential store. If you commit this XML to Git, no secrets are exposed. - delete-node removes the config — the agent's record in Jenkins is deleted. The actual machine is unaffected. JENKINS_HOME no longer stores any trace of the old agent.
Agent Capacity Planning
The right number of agents is determined by three things: your average build frequency, your average build duration, and how much queue time your team can tolerate. Here's how to think about it:
The back-of-napkin capacity formula
builds_per_hour × avg_build_duration_hours × peak_factor
# Example: 40 builds/hr × 0.05 hrs (3 min avg) × 1.5 peak = 3 executors
# With 2 executors per agent → 2 agents minimum, 3 for redundancy
The peak factor (1.5–2.0) accounts for the burst at the start of a sprint or just before a release when everyone is pushing simultaneously. Size for the peak, not the average — a queue at peak hours frustrates developers more than idle agents at off-hours.
Rule of thumb — executors per agent
Set executors to half the CPU count for build-heavy agents (compilation, Docker builds). Set them to equal the CPU count for test-heavy agents (mostly I/O and network). Never exceed the CPU count — you'll just create context-switching overhead.
Always maintain minimum two agents per label
If one goes down for maintenance or fails unexpectedly, builds can continue on the second. One agent per label is a single point of failure regardless of how robust the agent machine is.
Use the Prometheus queue depth metric to right-size
From Lesson 32: default_jenkins_queue_size_value tells you how many builds are waiting right now. If this number is persistently above 3 during working hours, you need more agents. If executor utilisation is below 20%, you have too many.
Teacher's Note
The most common distributed builds mistake is having too few agents per label and too many labels. Start with linux, windows, and docker. Add specialised labels only when you have a genuine routing requirement — not just because it's possible.
Practice Questions
1. Which Jenkins CLI command creates a new agent by accepting an XML configuration from stdin?
2. Which CLI command takes an agent offline for maintenance while allowing running builds to complete, rather than abruptly killing the agent?
3. What label expression in a Jenkinsfile's agent block ensures a job only runs on agents that have both the linux and eu-west labels?
Quiz
1. What is the minimum safe number of agents per label in a production Jenkins environment?
2. What is the recommended number of executors for a build-heavy agent with 8 CPUs?
3. In the agent XML definition, what does <credentialsId> contain — and why is this the correct approach?
Up Next · Lesson 35
Scaling Jenkins
Your distributed build fleet is running — now learn how to scale Jenkins itself when build volume grows, teams multiply, and a single master becomes the bottleneck.