Troubleshooting Deadlocks in Jenkins

Jill Thornhill

7 hours ago

Page Contents

Jenkins is, for the most part, stable and reliable. But like any software, it has its moments, and when it does, it’s a real problem. DevOps rely heavily on the application to build, test and deploy software, so a Jenkins malfunction causes expensive delays. Jenkins is also often used in production to control batch jobs, and outages cause serious problems.

One of the problems that occasionally occurs in Jenkins is a deadlock. This happens when two objects contend over the same locks, and end up in a situation where neither can move forward. Both threads hang until the JVM is restarted.

We can visualize it like this:

Fig: Deadlock

Thread 0 holds the lock to L1, and wants the lock to L2. Thread 1 holds the lock to L2, and wants the lock to L1. Neither can ever move forward.

A Jenkins deadlock may cause a wide range of problems, depending on whether the problem affects an agent or the controller . Jobs may hang, agents may become unresponsive, and the master console often behaves erratically.

This article looks at causes, diagnostic techniques and possible fixes.

Common Symptoms of a Jenkins Deadlock

In many systems, a deadlock causes a total system hang. In Jenkins, this may not be the case, since other threads continue to do their job. Symptoms that can indicate a deadlock include:

If the deadlock is in an agent:
- A build hangs indefinitely. There’s no time-out, error message, logging or console output.
- An Agent Queue backs up. The agent is online, and may show an idle status, yet it’s not processing new jobs.
If the deadlock is in the controller:
- GUI behaves erratically. For example, it may skip to the wrong screen while you’re defining a build step.
- GUI is slow or partially responsive.
- Hangs occur during startup or shutdown. This is fairly rare.

Understanding the Root Causes of Jenkins Deadlocks

Since Jenkins relies on a high level of multitasking, concurrency issues can happen. To further complicate this, it has a huge number of plugins available, and the facility for developers to write their own. Plugins may also run more than one task concurrently. Jenkins’ extension point architecture allows plugin code to be triggered by events. This opens up a wide field for deadlocks to occur.

For a description of how deadlocks occur, with sample code, see Deadlock Explained.

As you can imagine, there may be any number of reasons why deadlocks can occur in Jenkins. Some of the more common ones include:

Plugins that hold onto locks that conflict with the Jenkins core;
Plugins that conflict with each other;
Network instability causing agents to repeatedly connect and disconnect;
Jobs run on the master node rather than having separate agents;
System Overloading, especially where a large number of complex pipelines are used;
Blocking used in shared libraries.

Additionally, within the Jenkins agent, deadlocks have been traced to:

Conflict of ordering the use of lockable resources between concurrently-running jobs;
Pipelines that wait on asynchronous callback;
Complex pipelines, especially when using custom Groovy scripts.

Not every system hang is an actual deadlock. Problems that have been mistakenly identified as deadlocks include:

Container or operating system hangs;
Thread pool starvation;
Blocked I/O;
Hung remoting channels;
GC-related latency.

Since deadlocks are simple to diagnose with the right tools, a good strategy is to ascertain whether the problem is, in fact, a deadlock. If not, we’ll need different troubleshooting techniques.

How to Diagnose Deadlocks in Jenkins

The first step with any Jenkins-related issue is to establish whether the problem is in an agent, or in the master. Hung builds usually indicate a problem with an agent, so this is the best place to start. Occasionally, pipeline-related hangs can also indicate a problem with the master. This is because, although the actual builds run on the agent, pipeline orchestration takes place on the controller.

Rule of thumb: if any of the builds are stuck, start with the agent, and look at the controller only if the agent has no issues. If no builds are hung, start with the master.

The most important diagnostic artifact here is a thread dump. There are several ways you can capture the thread dump. Alternatively, you can capture a full range of diagnostic artifacts with a single command, using yCrash’s free open-source yc-360 script. This is better because if the problem turns out not to be a deadlock, or if you need to investigate further, you’ll have all the information you need about the JVM and its environment.

You can either investigate the thread dump for deadlocks manually, or via a thread dump analyzer such as fastThread, or by uploading the zip file created by yc-360 into the yCrash dashboard via the ‘Upload Bundle’ button at the top right.

Let’s work through an actual example. For this exercise, we took the demo plugin from the Jenkins Tutorial and added some buggy code as follows.

A class DeadLockDemo starts two threads, of classes ThreadA and ThreadB:

			
package com.buggyapp.deadlock;
public class DeadLockDemo {
	
	public static void start() {
		
		System.out.println("App started");
		new ThreadA().start();
		new ThreadB().start();
	}
	
}

ThreadA calls method1(), which is a synchronized method, from CoolObject.

			
package com.buggyapp.deadlock;
public class ThreadA extends Thread {
	@Override	
	public void run() {
		CoolObject.method1();
	}
}

		

ThreadB calls method2(), which is a synchronized method, from HotlObject.

			
package com.buggyapp.deadlock;
public class ThreadB extends Thread {
	@Override	
	public void run() {
		HotObject.method2();
	}	
}

		

CoolObject.method1() calls the synchronized method2() from HotObject.

			
package com.buggyapp.deadlock;
public class CoolObject {
	public static synchronized void method1() {
		
		try {
			// Sleep for 10 seconds
			Thread.sleep(10 * 1000);
		} catch (Exception e) {
			
		}
		
		HotObject.method2();
	}
}

		

HotObject.method2() calls the synchronized method1() from CoolObject.

			
package com.buggyapp.deadlock;
public class HotObject {
	public static synchronized void method2() {
		
		try {
			// Sleep for 10 seconds
			Thread.sleep(10 * 1000);
		} catch (Exception e) {
			
		}
		
		CoolObject.method1();
	}	
}

		

This results in a deadlock.

ThreadA holds the lock to CoolObject.method1() and is waiting for a lock on HotObject.method2().
ThreadB holds the lock to HotObject.method2() and is waiting for a lock on CoolObject.method1().

We installed our version of the plugin demo in Jenkins and used it to define a build step for a job. This activated the deadlock. The master console began behaving erratically.

We then used the yc-360 script to capture diagnostic artifacts and loaded the bundle into the yCrash dashboard.

yCrash’s root cause analyzer instantly diagnosed that the application has a deadlock, as per the highlighted section of the image below. It provided a clickable button to investigate further.

Fig: yCrash Root Cause Analysis Identifies a Deadlock

This confirms that we have a deadlock. If the issue was not a deadlock, yCrash would, in most cases, point us towards the actual cause of the problem.

Clicking the link takes us to the FastThread analysis of the thread dump.

Fig: FastThread Shows Details of Affected Threads

This provides details of the deadlocked threads, including the stack trace of each thread. The stack trace gives the class name and line number within the affected classes. This helps to identify whether the problem lies within Jenkins core classes or within a plugin. Plugin developers can use the stack trace to debug the code.

If we don’t have any diagnostic tools available, the actual thread dump is a text file, and by scrolling through it, we can also find this information, as per the excerpt below. Since thread dumps in Jenkins can be very large, this method is very time-consuming.

			
Found one Java-level deadlock:
=============================
"Thread-13":
  waiting to lock monitor 0x00007f8b681413c0 (object 0x00000000d32d8ea8, a java.lang.Class),
  which is held by "Thread-14"
"Thread-14":
  waiting to lock monitor 0x00007f8b5c254a30 (object 0x00000000d3200300, a java.lang.Class),
  which is held by "Thread-13"

		

How to Fix Deadlocks in Jenkins Core and Plugins

Let’s look at this from two different points of view. If you’re an administrator, the first will be most useful to you, whereas the second is more useful for plugin developers.

1. The Jenkins Administrator

Your first priority will be to get the system back up and running as fast as possible. If a job has hung, but the console is still working, first cancel the job. In all cases, restart Jenkins. This is a temporary fix only: the problem may easily recur.

Next, if a job had hung, you need to look closely at that job.

Does it use lockable resources?
What was running at the same time?
Does it have complex Groovy scripting?

Review the job and any that were running alongside it, making sure that if the jobs were using locking, that all jobs acquire the locks in the same order.

The next step is to look at what plugins have been loaded recently. You can try removing the latest plugins if the problem keeps recurring. Make sure all plugins are up to date, in case fixes have been provided.

Make sure no builds are running on the controller. You should have sufficient agents to make this unnecessary, and disable the facility to run jobs on the master.

Make sure the Jenkins system as a whole is not overloaded, and has adequate resources. In extreme cases, you may need to create a second Jenkins system to split the workload.

If the problem recurs, make sure you’re using the latest version of Jenkins, as the developers are continually improving the code to use newer and better concurrency models.

2. Jenkins Plugin Developer

Here are a few tips for reducing the likelihood of deadlocks within plugins:

If possible, use the Java concurrency API rather than the older technique of using synchronized methods. The java.util.concurrent.locks package defines several useful lock types, which include methods such as tryLock(). This lets us set sensible timeouts on locks;
If you use locks, define in what order they should be acquired, and make sure this is implemented throughout the plugin;
Never acquire more than one lock at a time;
Put timeouts on lock acquisitions;
Avoid synchronized callbacks into the Jenkins core;
Never include blocking I/O inside locks;
Avoid Nested Locks;
Avoid Static mutable states.

Java is currently working on improved multitasking with the new Structured Concurrency API. This lets us group related tasks into units of work. It’s currently in the preview stage (as of Java 26), but it’s worth looking into for long-term revisions.

Can Deadlocks be Predicted?

Deadlocks are one of the hardest issues to predict, since they tend to be intermittent and only occur when two events coincide. What we can do, however, is identify highly-contended locks, which are often vulnerable to deadlocks.

Monitoring thread health regularly can pick up issues such as:

A high number of blocked threads over a long period;
Unhealthy thread patterns.

The easiest way to do this is to use yCrash to continually sample diagnostics and raise the alarm when potential issues are identified.

This article on JVM Micrometrics describes how these two unhealthy trends can be identified.

Conclusion

Jenkins is a high-concurrency system, and allows a huge number of third-party plugins to be implemented. It will always, therefore, be vulnerable to deadlocks.

Jenkins deadlocks can cause the build agent to hang, with queues accumulating and jobs not completing. It can also manifest more subtly, with the master console behaving unpredictably.

Since several other issues can mimic deadlocks, the first task is to confirm that it is, in fact, a deadlock rather than a system hang.

The yCrash root cause analyzer speeds up diagnosis considerably, and by using sampling techniques, it can identify lock contention proactively before the system actually deadlocks.

FAQ

Are deadlocks more common in older Jenkins versions?

Generally, yes. Modern Jenkins releases and plugins have benefited from years of concurrency improvements, bug fixes, and architectural enhancements that have reduced the likelihood of deadlocks.

Can restarting Jenkins fix a deadlock?

A restart usually restores service temporarily, but the underlying cause—such as a plugin bug, lock contention issue, or problematic Pipeline design—should still be investigated.

How can I prevent deadlocks in Jenkins Pipelines?

Avoid nested locks, minimize the time resources are locked, use lockable resources consistently, and keep Pipeline logic as simple as possible.

What is the difference between a Jenkins deadlock and thread pool starvation?

A deadlock occurs when two or more threads are permanently waiting for each other to release resources or locks, so none of them can make progress. Thread pool starvation, by contrast, occurs when all threads in a thread pool are busy or blocked, leaving no free threads to execute new tasks. While both can make an application appear frozen, a deadlock involves a circular dependency between threads, whereas thread pool starvation is a resource exhaustion problem that may resolve itself if some threads eventually complete their work.