Explaining OutOfMemoryError on Overhead Limit Exceeded

There are several reasons for an application to fail on OutOfMemoryError. Most versions of OutOfMemoryError have meaningful messages such as “Java heap space” or “Requested array size exceeds VM limit.” However, the message “GC overhead limit exceeded” usually puzzles even experienced developers. 

The JVM  throws OOM with this message if an application spends over 98% of its time collecting garbage. Usually, the root cause is that garbage collectors dispose of objects at a lower rate than created, but the JVM manages to wade through.

In this article, we’ll dig deeper into why this might occur and how we can account for this problem.

Reproducing Overhead Limit Exceeded Error

One of the main reasons for exceeding the overhead limit might be the difference between creation and consumption rates. This type of OutOfMemoryError is usually a subtle and hard-to-reproduce problem in the production environment. 

We also might have issues reproducing this problem for educational reasons. The main problem is that we need to have a balance between mutators and collectors. If the creation rate is significantly higher, we immediately hit the heap boundary and get OutOfMemoryError. At the same time, if our garbage collector deals with the garbage, we have lower throughput, but an application manages to slog through.

Another thing to consider is the type of garbage collector we’re using. ParallelGC would be more stable for our case, throwing “OutOfMemoryError: GC overhead limit exceeded.” Because ParallelGC is a stop-the-world collector, decreasing the throughput and grinding the application to a halt is easier. Also, it would be harder to go outside the heap boundaries. 

Let’s consider the following example:

public static void main(String[] args) {
    List<Integer> list = new ArrayList<>();
    int i = 0;
    while (true) {
        list.add(i);
        ++i;
    }
}

This code would produce “OutOfMemoryError: Java heap space.” The reason is that although we add a single element each time, under the hood, ArrayList allocates increasingly longer arrays. The length doubles every time we hit the limit:

private Object[] grow(int minCapacity) {
    int oldCapacity = elementData.length;
    if (oldCapacity > 0 || elementData != DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        int newCapacity = ArraysSupport.newLength(oldCapacity,
          minCapacity - oldCapacity, /* minimum growth */
          oldCapacity >> 1           /* preferred growth */);
        return elementData = Arrays.copyOf(elementData, newCapacity);
    } else {
        return elementData = new Object[Math.max(DEFAULT_CAPACITY, minCapacity)];
    }
}

Thus, JVM abruptly stops as it cannot allocate enough space. However, getting “OutOfMemoryError: GC overhead limit exceeded” with ArrayList is still possible but requires more setup and testing. It’s easier if we use a LinkedList:

public static void main(String[] args) {
    List<Integer> list = new LinkedList<>();
    int i = 0;
    while (true) {
        list.add(i);
        ++i;
    }
}

An ArrayList creates an internal array to hold the data. However, when it runs out of space, the default behavior is to double its size, claiming more memory. This way, we approach the heap size in ever-increasing leaps, causing OutOfMemoryError because we cannot allocate enough space. 

At the same time, while working with a LinkedList, we allocate a Node on each iteration and steadily approach the heap’s limit, the garbage collector thrashes, causing “OutOfMemoryError: GC overhead limit exceeded.”

To get OutOfMemoryError faster, we can also decrease the size of the heap. Our VM options might look like this:

-Xmx100m -XX:+UseParallelGC

Root Causes

In real life, the system, for example, a retail website, might perform well on regular days but has issues on the weekends when the number of customers increases. Thus, request spikes might cause this problem and overwhelm a garbage collector. 

At the same time, we can cause this by implementing finalizers. Additional logic in finalizers prolongs objects’ lifetime and requires two garbage collections on revocation. However, as mentioned previously, both examples need a very tight balance. 

Apart from the message in the error logs, we can recognize this problem by a particular garbage collector behavior. GCeasy helps us to analyze the garbage collector logs, and we might see the following picture:

GC overhead limit exceeded collection pattern
Fig. 1: GC overhead limit exceeded collection pattern

We can notice a specific pattern of repeated garbage collection cycles. Although it runs often, a garbage collector cannot reclaim space and proceeds with attempts. Seeing this pattern is the first step in identifying the issue. 

1. Plain Old Memory Leak

Our examples used a simple memory leak to replicate this behavior. However, as was mentioned previously, this requires a careful setup. We should adjust the object creation rate, the garbage collector should behave in a certain way, and we should approach the limits of our heap steadily. 

We can analyze a heap dump to get more information about the heap’s state during the failure. The VM option -XX:+HeapDumpOnOutOfMemoryError would create a dump on OutOfMemoryErorr, providing us with valuable information. It’s always a good practice to configure automatic heap dumps as it might help us with troubleshooting and doesn’t cost us much. 

In case of a memory leak, we can analyze a heap dump using HeapHero. We might see a bunch of objects in a heap:

Heap structure with a memory leak
Fig. 2: Heap structure with a memory leak

Careful analysis of these objects usually helps to find the problem. After identifying the culprit object, we can track the creation-reclamation process and find the bug in our system. Luckily for us, memory leaks are generally more straightforward to reproduce.

2. Overridden Finalizers

Another possible reason for exceeding the throughput limit is the problem with finalizers. In this case, the problem arises not from taking up the space in the heap but rather because of the difference in production-collection rates. 

HeapHero helps us with this as well. We just need to check a different section:

Unreachable objects
Fig. 3: Unreachable objects

If we have many objects waiting in the queue, something in our application often prevents a garbage collector from doing its job quickly. Overridden finalizers might be the culprit. 

Unlike a memory leak, we would see many unreachable objects waiting for collection. Thus, we don’t have a memory leak per se, but a slower reclamation rate keeps them in memory. We can think about this situation as a traffic jam: cars are moving, but very slowly.

Often, just looking through the code and implementation of the objects might help us to identify the issue. Additionally, linters and static code analysis tools may help us with the problem. IDEs also can draw our attention to this, as the finalize() method was deprecated starting from Java 9.

3. Slow Threads

Another reason for such a problem might be slow garbage collection threads. Sometimes, JVM picks the threads randomly, so finalizers run in a lower-priority thread, so we spend fewer CPU cycles on garbage collection.
To identify such a problem, we need to use additional tools, such as fastThread. This way, we can identify the state and the number of threads working in our application:

Finalizer thread
Fig. 4: Finalizer thread

However, we don’t have a built-in way to create a thread dump on OutOfMemoryError. Luckily, we can use a yCrash 360° tool to monitor the health of the application during its lifespan:

State of a finalizer thread
Fig. 5: State of a finalizer thread

Technically, we can combine the yCrash 360° tool with -XX:OnOutOfMemoryError, but while the application is dying, sometimes it’s challenging to get meaningful information about it. 

Conclusion

There are several types of OutOfMemoryError. Each of them identifies a different problem in our application. Thus, we need special tools to identify the root cause. Each OutOfMemoryError type requires different approaches and solutions.

yCrash provides various tools to help us identify and resolve the problems we might face with application performance and memory management. Good practices, benchmarking, and monitoring help us to avoid hard-to-debug issues, missing SLAs, and it-worked-in-dev situations.

3 thoughts on “Explaining OutOfMemoryError on Overhead Limit Exceeded

Add yours

  1. Hmm. Those are a lot of interesting thoughts. FWIW, I’ll say that when I discuss this GC overhead limit exceeded (gcole) error, I refer to it more simply as being essentially a warning that the heap might soon become full.

    As for the reason why, which is a focus here, I’ll say also that I’m only ever helping people who run java via multi-threaded application servers. As such, there can be far more “reasons” for the varying rising and falling of objects in the heap. It could be about load, or code, or configuration, and so on.

    I share this because other readers running apps on such app servers might want to think more broadly about the reasons for (and indeed the meaning of) this oom gcole error.

    It also brings up another interesting consideration: sometimes some resources we read may well be written from the perspective of considering only a single running Java app, perhaps just to keep things simple, or perhaps because that’s the intended focus; but sometimes also it’s just because there’s no consideration being given to how different things might be in a multi-threaded application server. And that might something worth keeping in mind when considering possible solutions.

    I’m not saying this to demean or deprecated what’s offered here. Indeed, there are so many great resources in this blog and the tools, training, etc. I only mean to be contributing to the knowledge folks can take from here.

    1. Thank you for your feedback, Carehart! We genuinely appreciate your valuable contributions to the discussion. Your perspective on the GC overhead limit exceeded (gcole) error adds depth to our understanding. Looking forward to more interactions with you!

  2. Hello carehart!

    Indeed, there might be multiple reasons for this error. We used a simple application here only to show the basic reason for `Overhead Limit Exceeded,` and as you correctly mentioned, there might be other reasons and combinations of reasons. The problem with them is that they aren’t reproducible in the test environment. This was a necessary simplification for the purpose of explanation.

    Thanks for the comment! It adds more context to the issue and helps readers gain more insight into the GC’s inner workings.

Share your Thoughts!

Up ↑

Discover more from yCrash

Subscribe now to keep reading and get access to the full archive.

Continue reading