Java Memory Leak Troubleshooting: How We Lost 3 Days, and Fixed It in Hours

This guide outlines the installation and configuration of yCrash, a tool for managing Java application memory issues. The author recounts a crisis where improper setup led to a three-day debugging ordeal before using yCrash effectively. Key lessons emphasize the importance of proper setup, continuous monitoring, and utilizing all available data artifacts for effective troubleshooting.

Building Connections: yCrash at Westpac Sydney

yCrash emphasizes the importance of human connections alongside technology. Senior Software Engineer Unni Mana visited Westpac in Sydney, fostering team relationships. His meeting included discussions on yCrash tools, enhancing collaboration. Positive feedback highlighted potential automation benefits. The visit strengthened ties, making it a personal rather than formal encounter for Unni.

JBOSS Shunning, Unloading Class SUN.REFLECT

A major shipping/logistics company faced issues with their Java 6 JBoss Cluster on RedHat Linux, where instances frequently dropped from the cluster due to long pauses during Full Garbage Collection. This led users to repeatedly sign out and log in. Increasing the Permanent Generation space resolved the issue, restoring normal functionality.

Java Batch Optimization

A major North American shipping monopoly experienced lengthy batch job processing times after migrating to Java, impacting operations in China and US online transactions. Engaged to resolve these issues, I successfully reduced the processing time from 16:04 hours to 3:14 hours within two months, even while accommodating new requirements and increased traffic.

WORKDAY – GUEST LECTURE

Workday, a leader in cloud ERP for Finance, HR, and planning, hosted a guest lecture by architect Ram Lakshmanan on February 21, 2019, at their Pleasanton, CA headquarters. The two-hour presentation on JVM performance engineering and troubleshooting was well-received by Workday's SRE and Performance engineers. Photographs from the event are included.

TD Bank using GCeasy

TD Bank, a leading Canadian bank, utilizes GCeasy, FastThread, and HeapHero products. Recently, Ram Lakshmanan, our architect, delivered a training session for their performance engineers in Toronto, focusing on a critical system review of a core application. The engineers showed enthusiasm and engaged actively during the training.

CloudBees GC performance optimized

CloudBees, the leader in Jenkins CI/CD pipeline, has enhanced garbage collection performance utilizing GCeasy. Ryan Smith, a Sr. DevOps Engineer at CloudBees, shares insights from his experience optimizing garbage collection performance, highlighting the significance of these improvements for the industry.

Uber Optimizes Garbage Collection Performance

Uber is facing significant performance issues due to increased traffic, including memory bottlenecks like long garbage collection pauses and memory leaks. Their engineering team shared insights on optimization strategies, employing tools like GCeasy to address problems. Notably, they identified a misconfigured thread causing excessive object creation through extensive manual analysis, highlighting the complexity of managing numerous threads.

DSquare Trading App Addressed GC Pauses

Dsquare is a specialized FX trading boutique excelling in short-term trading, leveraging algorithmic models in the foreign exchange market with daily volumes over $3 trillion. Jad Sarmo discusses the development of their high-performance, low-latency Java trading application, addressing challenges and optimizations, including the use of GCeasy for garbage collection performance.

Up ↑