Recently we confronted an interesting ‘java.lang.OutOfMemoryError: Metaspace’ problem in a Microservice application. This Microservice application will run smoothly for initial few hours, later it will start to throw java.lang.OutOfMemoryError: Metaspace. In this post let me share the steps we pursued to troubleshoot this problem.
Different types of OutOfMemoryError
JVM memory has following regions:
a. Young Generation
b. Old Generation
c. Metaspace
d. Others region
When you encounter ‘java.lang.OutOfMemoryError: Metaspace’, it indicates that the Metaspace region in the JVM memory is getting saturated. Metaspace is the region where metadata details that are required to execute your application are stored. In nutshell it contains class definitions, method definitions and other metadata of your application. To learn more about what gets stored in each of the JVM memory regions, you may refer to this video clip.
Video: To see the visual walk-through of this post, click below:
Note: There are 9 different types of java.lang.OutOfMemoryError. You can learn about those OutOfMemoryError from here. 'java.lang. OutOfMemoryError: Metaspace’ is one type of them. But not a common type though.
Diagnose java.lang.OutOfMemoryError: Metaspace
Best place to start debugging ‘java.lang.OutOfMemoryError‘ is the Garbage Collection log. If you haven’t enabled garbage collection log for your application, you may consider enabling it by passing the JVM arguments mentioned here. Enabling garbage collection log doesn’t add noticeable overhead to your application. Thus it’s recommended to enable garbage collection log on all production JVM instances. To see the great benefits of garbage collection log, refer to this post.
We uploaded the garbage collection log of this troubled microservice application to the GCeasy – GC log analysis tool. Here is the GC log analysis report generated by the tool. Below is the Heap usage graph reported by the tool.
Fig: Heap usage graph reported by GCeasy
I would like to highlight few observations in this graph:
a. Red triangle in the graph indicates the occurrence of the Full Garbage Collection event. When the Full garbage collection event runs, it pauses your entire application. It tries to free up memory from all the regions (Young, Old, Metaspace) in the memory. You can see Full Garbage Collection events to be running consecutively from 12:30am
b. Even though maximum heap memory size is 2.5GB, Full Garbage collection events were consecutively triggered even when heap memory usage is only at 10% (i.e. 250MB) of its maximum size. Typically, Full garbage collection events are consecutively triggered when heap memory grows to its maximum size. To understand why this happens, please review next point #c.
c. Below is the Metaspace region’s memory consumption graph from the report:
Fig: Metaspace usage graph reported by GCeasy
You can notice the Metaspace region’s memory consumption growing and dropping in a saw-tooth pattern till 12:30am. After 12:30am, Metaspace region’s memory consumption isn’t dropping at all, even though Full GCs are consecutively running. It indicates Garbage collection events aren’t able free up Metaspace. It clearly indicates there is a memory leak in the Metaspace region.
Root cause of java.lang.OutOfMemoryError: Metaspace
Now we have confirmed that a memory leak is happening in the Metaspace region. Thus the next logical step is to inspect the Metaspace region and try to understand what objects occupy this region. Here is a post which summarizes the 5 different approaches to study the contents in the Metaspace region. We went for the heap dump analysis approach.
Basically we used the yCrash tool to capture the heap dump and analyze it. Tool instantly pointed out the problem. Tool was reporting a thread which was experiencing OutOfMemoryError. Apparently, this thread was experiencing OutOfMemoryError when invoking a 3rd party library. Due to a bug this 3rd party library was creating new class definitions for every new request. This application was running on an older version of this 3rd party library. However, this bug was fixed in the latest version of the library. Once the 3rd party library was upgraded to the latest version, this problem got resolved.
Folks, while you offered a solution for that one case of a bug in a 3rd party library, it seems very much worth mentioning also that someone could get this error simply because their jvm startup args include a maxmetaspacesize value, and their metaspace size reaches this value.
Of course, a right answer is indeed to find WHY it’s reaching that limit, but some may ALSO find that they or someone may have set that max, and to an inappropriately low value. I’ve seen it happen–especially when folks (or tools) just blindly change the old maxpermsize (for Java 7 and earlier) to the same value for the new maxmetaspacesize (since Java 8)
What they often don’t realize is that another change (when Java 8 switched to metaspace) is that metaspace memory comes from available SYSTEM memory rather than the heap or any other jvm memory space. So if NO maxmetaspacesize is set, then the jvm would be able to use up to the available SYSTEM memory for metaspace.
Again, I’m not saying that setting a higher maxmetaspacesize (or removing it) is always the RIGHT solution to this error, but my point is simply that an article on this error should at least acknowledge and discuss how any current setting of maxmetaspacesize should at least be considered.
Hope that’s helpful. Thanks for the many resources you guys create regarding the jvm.
@carehart – I agree with your assessment. This pattern/problem will surface when MaxMetaspace value is under configured as well. It should have been pointed out acknowledged in the post. Good point.
Carehart is right :
Metaspace is *not* a part of the heap ( not counted in -Xmx ) . Even its underlying allocation mechanism is different ( In Linux, Metaspace uses malloc() whereas Java heap uses mmap() )
@sonyantony – I agree. Metaspace is not part of the heap. In this video clip have attempted to explain it: https://www.youtube.com/watch?v=uJLOlCuOR4k
Ram :
At around 7.07 in your embedded video , ‘allocated space’ is shown as 1,230 and before and after GC are 652.71484.
What are these values ?
If 1230 has already been allocated, why is 652.7 starting a GC ?
is 1230 the value for -XX:MaxMetaspaceSize ?
@Sonyantony – I need to admit you have not only sharp mind, but also eyes. I don’t know who you are, but hats-off. There are two issues going on here:
a. In GC log what is printed is ‘Reserved’ space of the metaspace. That’s what printed as ‘Allocated’ space in the GCeasy reports. So it means what’s reported as ‘Allocated’ space is not the space allocated by the OS to the application.
b. In Metaspace region there are two regions: ‘class space’ and ‘non-class space’. Refer to https://poonamparhar.github.io/understanding-metaspace-gc-logs/. In this case only class space was getting filled up
These two reasons account for the disconnect you are pointing out.
Hi, thanks for your article. Could you please share the name of the 3rd party library of the metaspace/classloader leak? And maybe the issue reference, the problem&fix version? Thanks!