I have heard a few of my developer friends say: “Garbage Collection is automatic. So, I do not have to worry about it.“ The first part is true, i.e., “Garbage Collection is automatic” on all modern platforms – Java, .NET, Golang, Python… But the second part i.e., “I don’t have to worry about it.” – may not be true. It is arguable, questionable. Here is my case to showcase the importance of Garbage Collection:
1. Unpleasant customer experience
When a garbage collector runs, it pauses the entire application to mark the objects that are in use and sweep away the objects that don’t have active references. During this pause period, all customer transactions which are in motion will be stalled (i.e., frozen). Depending on the type of GC algorithm and memory settings that you configure, pause times can run anywhere from a few milliseconds to a few minutes. Frequent pauses in the application can cause stuttering, juddering, or halting effects to your customers. It will leave an unpleasant experience for your customers.
2. Millions of dollars wasted
Here is a white paper we published, explaining factually how enterprises are wasting millions of dollars due to garbage collection. Basically, in a nutshell, modern applications are creating thousands/millions of objects. These objects must be continuously investigated to determine whether they have active references or are they ready for garbage collection. Once objects are garbage collected, the memory becomes fragmented. Fragmented memory must be compacted. All these activities consume *enormous compute cycles*. These compute cycles translate to millions of dollars in spending. If Garbage collection performance can be optimized, it can result in several millions of dollars in cost savings.
3. Low risk, high impact performance improvements
By virtue of optimizing Garbage collection performance, you are not only improving the Garbage collection pause time, but you are improving the overall application’s response time. We recently helped to tune the garbage collection performance of one of the world’s largest automobile companies. Just by modifying the garbage collection settings without refactoring a single line of code, we improved their overall application’s response time significantly. The below table summarizes the overall response time improvement we achieved with each Garbage Collection setting change we made:
|Avg Response Time (secs)||Transactions > 25 sec (%)|
|GC settings iteration #2||1.36||0.12|
|GC settings iteration #3||1.7||0.11|
|GC settings iteration #4||1.48||0.08|
|GC settings iteration #5||2.045||0.14|
|GC settings iteration #6||1.087||0.24|
|GC settings iteration #7||1.03||0.14|
|GC settings iteration #8||0.95||0.31|
When we started the GC tuning exercise, this automobile application’s overall response time was 1.88 seconds. As we optimized Garbage Collection performance with different settings, on iteration #8, we were able to improve the overall response time to 0.95 seconds. i.e., 49.46% improvement in the response time. Similarly, percentages of transactions taking more than 25 seconds dropped from 0.7% to 0.31%, i.e., 55% improvement. This is a significant improvement to achieve without modifying a single line of code.
All other forms of response time improvement will require infrastructure change or architectural change, or code-level changes. All of them are expensive changes. Even if you embark on making those costly changes, there is no guarantee of the application’s response time improvement.
4. Predictive Monitoring
Garbage Collection logs expose vital predictive micrometrics. These metrics can be used for forecasting application’s availability and performance characteristics. One of the micrometrics exposed in Garbage Collection is ‘GC Throughput‘ (to read more about other micrometrics, refer to this article). What is GC Throughput? If your application’s GC throughput is 98%, it means your application is spending 98% of its time processing customer activity and the remaining 2% of the time in GC activity. When the application suffers from a memory problem, several minutes before GC throughput will start to degrade. Troubleshooting tools like yCrash monitors ‘GC throughput’ to predict and forecast the memory problems before they surface in the production environment.
5. Capacity Planning
When you are doing capacity planning for your application, you need to understand your application’s demand for memory, CPU, Network and storage. One of the best ways to study the demand for memory is by analyzing garbage collection behaviour. When you analyze garbage collection behaviour, you would be able to determine average object creation rate (example: 150 MB/sec), average object reclamation rate. Using these sort of micrometrics you can do effective capacity planning for your application.
Friends, in this post, I have made my best efforts to justify the importance of garbage collection analysis. I wish you and your team the best to benefit from the highly insightful garbage collection metrics.