Recently we confronted an interesting
java.lang.OutOfMemoryError: Metaspace problem in a microservice application. This microservice application will run smoothly for the initial few hours, but later it will start to throw
java.lang.OutOfMemoryError: Metaspace. In this post, let me share the steps we pursued to troubleshoot this problem.
Different Types of OutOfMemoryError
JVM memory has the following regions:
- Young generation
- Old generation
- Others region
When you encounter
java.lang.OutOfMemoryError: Metaspace, it indicates that the Metaspace region in the JVM memory is getting saturated. Metaspace is the region where metadata details that are required to execute your application are stored. In a nutshell, it contains class definitions, method definitions, and other metadata of your application. To learn more about what gets stored in each of the JVM memory regions, you may refer to this video clip: JVM Memory – Learn Easily.
Note: There are 9 different types of
java.lang.OutOfMemoryErrors. You can learn about those in the post “Flavors of OutOfMemoryErrors.”
java.lang.OutOfMemroyError: Metaspace is one type of them, but not a common type.
Diagnose java.lang.OutOfMemoryError: Metaspace
Best place to start debugging the
java.lang.OutOfMemoryError is the garbage collection log. If you haven’t enabled the garbage collection log for your application, you may consider enabling it by passing the JVM arguments mentioned here. Enabling a garbage collection log doesn’t add noticeable overhead to your application. Thus it’s recommended to enable garbage collection logs on all production JVM instances. Learn more about how to see the great benefits of a garbage collection log here.
We uploaded the garbage collection log of this troubled microservice application to the GCeasy-GC log analysis tool. Here is the GC log analysis report generated by the tool. Below is the Heap usage graph reported by the tool.
Heap usage graph reported by GCeasy
I would like to highlight a few observations from this graph:
- The red triangle in the graph indicates the occurrence of the Full Garbage Collection event. When the Full Garbage Collection event runs, it pauses your entire application. It tries to free up memory from all the regions (Young, Old, Metaspace) in the memory. You can see Full Garbage Collection events to be running consecutively from 12:30 am.
- Even though the maximum heap memory size is 2.5GB, Full Garbage Collection events were consecutively triggered even when heap memory usage is only at 10% (i.e., 250MB) of its maximum size. Typically, Full Garbage Collection events are consecutively triggered when heap memory grows to its maximum size. To understand why this happens, please review the next point.
- Below is the Metaspace region’s memory consumption graph from the report:
Metaspace usage graph reported by GCeasy
You can notice the Metaspace region’s memory consumption growing and dropping in a saw-tooth pattern until 12:30 am. After 12:30 am, the Metaspace region’s memory consumption isn’t dropping at all, even though Full GCs are consecutively running. It indicates garbage collection events aren’t able to free up Metaspace. It clearly indicates there is a memory leak in the Metaspace region.
Root Cause of java.lang.OutOfMemoryError: Metaspace
Now we have confirmed that a memory leak is happening in the Metaspace region. Therefore, the next logical step is to inspect the Metaspace region and try to understand what objects occupy this region. There are 5 different approaches to studying the contents in the Metaspace region. We went for the heap dump analysis approach.
Basically, we used the yCrash tool to capture the heap dump and analyze it. The tool instantly pointed out the problem: it was reporting a thread that was experiencing
OutOfMemoryError. Apparently, this thread was experiencing
OutOfMemoryError when invoking a 3rd party library. Due to a bug, this 3rd party library was creating new class definitions for every new request. This application was running on an older version of this 3rd party library. However, this bug was fixed in the latest version of the library. Once the 3rd party library was upgraded to the latest version, this problem got resolved.