Monitoring != Root cause analysis

APM gives news, yCrash gives answer

Industry has seen cutting-edge Application Performance Monitoring tools (i.e., AppDynamics, NewRelic, Dynatrace) and infrastructure monitoring (i.e., Nagios, Ngmon) tools. These monitoring tools are great at detecting the symptoms of the problems. i.e., they can detect CPU spiked by x%, memory degraded by y%, and response time shot up by z seconds. But they dont answer the question: Why did the CPU spike up? Why is memory degraded? Why is response time increased?

To answer these questions, you need to capture garbage collection logs, thread dumps, heap dumps, nestat, and several more artifacts from your application. As the next step, you need to analyze these cryptic tedious dump files to identify the root cause of the problem. yCrash – the root cause analysis tool does this for you automatically. yCrash captures and analyzes these artifacts and identifies the root cause of the problem instantly.

Non-intrusive

APM agents run within your application’s JVM. They intercept every single call, adding siginificant overhead to your application. Even though APM agents claim to add less than 3% overhead, it’s far from reality.

yCrash agent runs on the device and NOT WITH IN THE JVM. It analyzes only the data which is already generated by the application. Thus yCrash doesn’t add any observable over-head to your application.

Micrometrics

APM reports macrometrics like CPU utilization, Memory utilization, response time, component level response time.

yCrash reports and analyzes micrometrics like GC throughput, Object creation rate, object promotion rate, GC latency, thread states, thread group size, list of open file descriptors, … For more details on the micrometrics, refer to this article. Using these micrometrics yCrash can predict and forecast outages before it happens.

Taste of pudding is in eating

Recently a major telecom company had an outage due to memory leak in it’s application. Their APM solution generated alerts stating the memory was spiking up. Telecom company used yCrash to diagnose the problem. yCrash reported the exact objects that were causing the memory to leak. Apparently, it turned out that the leak was caused by AppDynamics agent itself that was running within the application. Below screenshot shows ycrash reporting the AppDynamics agent that is causing the memory leak.

Fig: Screenshot showing yCrash reporting the AppDynamics agent causing the memory task.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

A WordPress.com Website.

Up ↑

%d bloggers like this: