APM gives news, Ycrash gives answer
APM tools like AppDynamics, New Relic, Dynatrace are excellent monitoring tools. They monitor your application and report that CPU spiked by x%, memory degraded by y%, response time shot up by z seconds. But they don’t answer why CPU spike up? Why memory degraded? Why response time shot up? To troubleshoot the problem, your operations engineer has to capture GC log, thread dump, heap dump, netstat, vmstat….. several few more artifacts and analyze these artifacts to identify what is causing the degradation in performance/availability characteristics. Ycrash automates the artifact capturing and analysis part. Ycrash reports the exact line of code that is causing the problem.
APM agents run within your application’s JVM. Ycrash agent runs only when it’s triggered. At other times, it’s dormant. Even when it runs, it runs on the host and not within the JVM.
Even though APM agents claim to add less than 3% overhead, it’s far from reality. But for heap dump, all other data captured by Ycrash doesn’t add any measurable overhead. However, heap dump is an intrusive operation and pauses your application. Thus, heap dump should be obtained when it’s necessary.
APM reports macrometrics like CPU utilization, Memory utilization, response time, component level response time.
Ycrash reports and analyzes micrometrics like GC throughput, Object creation rate, object promotion rate, GC latency, thread states, thread group size, list of open file descriptors, … For more details on this topic, you can refer to this article.
Taste of pudding is in eating
Recently a major telecom company had an outage due to memory leak in it’s application. Their APM solution generated alerts stating the memory was spiking up. Telecom company used ycrash to diagnose the problem. ycrash reported the exact objects that were causing the memory to leak. Apparently, it turned out that the leak was caused by AppDynamics agent itself that was running within the application. Below screenshot shows ycrash reporting the AppDynamics agent that is causing the memory leak.
Fig: Screenshot showing YCrash reporting the AppDynamics agent causing the memory task.