Site icon yCrash

How to Deal with Jenkins Performance Issues

Jenkins, a popular CI/CD pipeline, is used for several critical operations in the organization, such as building applications, conducting automated tests, and deployments in pre-prod and prod environments. If Jenkins is down or slow, engineers’ productivity will be severely hampered. Thus, extra care is given to major organizations to keep them up 24/7.

Jenkins can experience a wide range of performance problems, from slow UI and hanging builds to full outages. These problems can stem from many sources: JVM memory pressure, garbage collection inefficiencies, infrastructure limitations, network bottlenecks, plugin defects, or misconfiguration. In this post, let’s discuss what the most common Jenkins performance problems are, how to isolate their root causes quickly, and how to fix and prevent them.

Why Jenkins Performance Problems Can Be Difficult to Diagnose

Jenkins performance issues are not always easy to track down. The same symptom, a slow UI, a build stuck in the queue, or an unexpected outage, can be caused by very different underlying problems. The issue might lie in the JVM, a plugin, the underlying infrastructure, pipeline design, or communication between the controller and agents. Because so many moving parts work together, identifying the real root cause often requires looking at the system from multiple angles. 

Jenkins Runs on the JVM, And That Matters

At its core, Jenkins is a Java application, which means the JVM has a direct impact on how it behaves under load. Memory usage, garbage collection, thread activity, and JVM configuration can all influence responsiveness and stability. That said, not every Jenkins performance issue starts with the JVM. Effective troubleshooting requires understanding both the health of the JVM and the broader Jenkins environment in which it operates. 

Common Symptoms to Watch For

These are some of the most common symptoms of Jenkins performance issues:

These symptoms can originate from JVM behavior, plugins, infrastructure constraints, pipeline design, or controller-agent communication. Identifying the actual source requires collecting and analyzing the right diagnostic data. And if you face even one of these, then you must do due diligence. 

Root Causes of Jenkins Performance Issues

Jenkins performance problems can originate from many layers of the system. The 13 root causes below each map to a different component of Jenkins. Understanding which one is under stress is the first step to the right fix.

1. Poor Garbage Collection Behavior

Every GC pause freezes Jenkins. Long pauses (5–20 seconds) directly translate to UI hangs, build queue stalls, and plugin timeouts. Healthy systems target ≥99% GC throughput. Drop below that, and users start feeling the lag.

2. Heap Size Misconfiguration

Adjusting the heap size is tricky. If you make it too small, you’ll experience constant garbage-collection cycles and OutOfMemoryErrors. If you make it too large, each garbage collection cycle takes so long that Jenkins becomes slow and unresponsive. As a general rule, if you’re on a 32 GB host, limit the heap to 16 GB. If Jenkins needs more memory than that to work properly, it’s not just a tuning issue; it’s a sign to add more servers.

3. Overly Aggressive JVM Arguments

Copy-pasting JVM flags from blog posts or StackOverflow often backfires. Flags that work well for one workload can actively harm another. Overly restrictive GC flags, for example, can prevent G1GC from adapting to Jenkins’ variable workload, reducing throughput significantly. More flags do not equal better performance. In many cases, it is the opposite.

4. Unbounded Metaspace Growth

Java 8 replaced PermGen with Metaspace. This change introduced a risk: unlike PermGen, Metaspace doesn’t have a default upper limit. Without a set cap, it quietly increases in the background, often unnoticed until it uses all available memory on the host. Setting -XX:MaxMetaspaceSize is necessary; it acts as a safety measure.

5. Plugin-Triggered System.gc() Calls

Some plugin developers call System.gc() directly, forcing a full garbage collection outside the JVM’s natural cycle. In one case, this caused 11-second garbage collection pauses, which exceeded the 10-second high availability failover threshold and led to daily production outages. The solution is to add -XX:+DisableExplicitGC to your JVM arguments; this will prevent the JVM from responding to these calls.

6. Memory Leaks from Plugins or Build History

Heap growth isn’t always noticeable. Sometimes it’s a slow, steady rise caused by builds that aren’t pruned, workspaces that aren’t cleaned, or plugins that keep references they should have released. If these issues aren’t addressed, the gradual buildup will eventually lead to an OutOfMemoryError in Jenkins. The solution involves both configuration (setting build retention policies) and diligence (auditing plugins that perform long-term background tasks).

7. Thread Contention and Deadlocks

Not every Jenkins freeze is due to memory issues. When too many builds run at the same time, plugins compete for shared resources, or a poorly designed plugin holds a lock for too long, threads can end up waiting on one another. Builds get stuck. The user interface stops responding. GC metrics may look normal throughout, making it difficult to identify the root cause.

8. Infrastructure Bottlenecks (CPU, Disk, Network)

Performance problems are not always JVM-related. Saturated CPU, slow or overloaded disk (especially during heavy workspace I/O), network latency between the Jenkins controller and agents, and insufficient memory at the OS level can all degrade Jenkins performance significantly. It is important not to rule out infrastructure as a root cause before diving into JVM diagnostics.

9. Too Many Concurrent Builds on the Controller

When we run builds directly on the Jenkins controller, it puts a lot of pressure on the Java part that also runs the user interface, manages the queue, and coordinates the pipelines. This means the controller does not have resources, which makes it run slowly. The controller is in charge of things, so when it is slow, everything else is slow too. 

10. Excessive Plugin Count

Every plugin that we install takes up space and can run in the background, which slows down Jenkins. When we have a lot of plugins, ones that we do not use it can make Jenkins run more slowly. It is better to have several plugins that we actually use.

11. Large Build History and Unmanaged Workspaces

Jenkins keeps a record of all the builds, which can take up a lot of space on the computer. If we do not regularly clean up these records, it can make Jenkins run slowly. When the Jenkins controller has to manage several builds, it can run out of memory and have trouble accessing the filesystem, which slows everything down.

12. Inefficient Pipeline Scripts

Sometimes the scripts that we write to run the pipelines can be inefficient. This can happen when the scripts load files into memory or create too many objects. When this happens, it can cause the memory to spike, which slows down the build process. This is especially true, for scripts written in Groovy, which can create a lot of dynamic class definitions that take up space.

13. JDK Version and GC Algorithm Mismatch

Using a version of the JDK can cause problems because it does not have the latest improvements and bug fixes. The G1GC algorithm is recommended for Jenkins. It has been improved in newer versions of the JDK. If we use a JDK with a large Jenkins installation, it can cause the garbage collection to run poorly, which can be fixed by using a newer JDK version.

How to Diagnose Jenkins Performance Issues

Jenkins performance issues are rarely straightforward. A slow UI, hanging build, or unexpected outage can stem from the JVM, a plugin, the underlying infrastructure, or a combination of several factors. That’s why finding the real root cause often requires looking at multiple diagnostic artifacts rather than relying on a single metric. yCrash simplifies this process by automatically collecting GC logs, thread dumps, heap dumps, and system-level metrics from a live Jenkins environment. It then correlates the data and surfaces a root cause analysis report, helping teams move from symptoms to answers much faster. 

Using yCrash to Diagnose Jenkins Performance Problems

yCrash captures the complete diagnostic picture in one step, without requiring you to manually collect and analyze individual artifacts. Here is how to use it:

Step 1: Install and configure yCrash

Deploy the yCrash agent on your Jenkins host. It can be attached to a running JVM without a restart, which is important in production environments where you cannot afford downtime just to enable diagnostics.

Step 2: Enable M3 mode for continuous monitoring

When Jenkins is exhibiting performance symptoms, slow UI, hanging builds, and high CPU, use the Micro-metrics Monitoring (M3) mode in the yc-360 script. Unlike on-demand capture, M3 mode continuously collects lightweight JVM and system metrics to proactively detect early signs of performance degradation, forecasting problems like OutOfMemoryErrors before they impact users. This collects:

Step 3: Review the automated root cause analysis

yCrash helps you correlate all collected data and produces a prioritized report identifying which layer of the system is causing the problem, whether it is GC pressure, thread contention, a memory leak, or infrastructure saturation. This eliminates the guesswork of manually correlating multiple diagnostic artifacts and dramatically reduces the time to root cause.

Fig: yCrash report surfacing errors, repeating patterns, time gaps, and PII leakage observations from app logs 

Step 4: Address flagged issues

The report showcases specific, actionable findings, for example, a plugin causing excessive GC, a thread pool that is saturated, or a build history accumulation causing heap growth. Each finding comes with context to help you understand the scope of the problem before making changes.

Now that you have all the information that is required to address the issue, you are one more step closer to putting the performance issue to bed. 

How to Prevent Performance Problems in Jenkins

Before you upgrade to a new release of Jenkins or install a new Jenkins plugin in the production environment, you might be studying the following key metrics in your performance lab:

These are wonderful metrics that highlight the performance characteristics of the new release. However, several performance problems slowly build over the period of time, for example, for most applications, OutOfMemoryError happens only if it runs for more than 1 week. In the performance lab, we don’t run such long endurance tests. 

The above-mentioned metrics are more reactive indicators that don’t indicate the silently lurking problem in the environment. We recommend studying the below-mentioned Micro-metrics along with the above reactive indicators in the performance lab and certify the release. These Micro-Metrics are good at predicting/forecasting performance problems even if they act at an acute scale.

yCrash tool facilitates your reporting of these Micro-Metrics, which will unearth several performance problems well in advance, before they silently surface in production. You can find the details on how to source and study these Micro-Metrics through yCrash from here

Real-World Jenkins Performance Case Studies

CloudBees, the company behind Jenkins, doesn’t just recommend JVM diagnostics; they practice it themselves. Their Sr. DevOps Engineer has publicly documented using the yCrash suite of tools, including GCeasy, to diagnose and resolve Jenkins performance issues at Fortune 100 enterprises. Here is what CloudBees’ Sr. DevOps Engineer says about leveraging these tools to deal with Jenkins performance issues:

Fortune 100 Bank: From 92% to 99% Throughput

A few JVM flags were enough to slow down an entire Jenkins setup. Users faced slow logins, with GC pauses exceeding 20 seconds and ~42,000 GC cycles in 72 hours. GC log analysis using GCeasy revealed that overly restrictive JVM arguments were limiting G1GC efficiency. Once those flags were removed, throughput improved to 99%, and GC cycles dropped to ~2,800.

Shipping Company: Daily HA Failovers Eliminated

Long GC pauses were triggering failovers and outages every day. GC pauses regularly exceeded the 10-second HA threshold, causing repeated disruptions. GCeasy analysis highlighted inefficient JVM arguments and plugin-triggered explicit GC calls as the root cause. After removing the arguments and disabling explicit GC, throughput reached 99%, and max pause dropped to ~660ms.

Conclusion 

Jenkins performance problems can originate from a lot of places. This includes the JVM, the application layer, plugins, or the underlying infrastructure. Sometimes these problems show up as slow logins. Sometimes they show up as hanging builds. They can even show up as unexpected outages. Identifying the true root cause often requires careful investigation.

The teams that fix these problems quickly are the ones that collect the information when things go wrong. Effective troubleshooting starts with collecting the right diagnostic data across the JVM, Jenkins application, plugins, and infrastructure layers. Tools such as yCrash help bring these signals together, reducing the time required to isolate the true root cause. The earlier issues are detected, the easier and less costly they are to resolve.

We should also monitor JVM micro-metrics during performance testing and before production releases. These leading indicators can reveal potential issues long before they become visible through traditional metrics such as CPU utilization, memory usage, or response time. Detecting problems early is almost always faster, less disruptive, and less expensive than troubleshooting an outage in production.

Exit mobile version