yCrash Buddy – Your AI Troubleshooting Assistant

yCrash Buddy is an enterprise-grade AI Troubleshooting assistant that has the potential to super charge the troubleshooting and performance optimization capabilities of your team by 10x. This post intends to highlight important capabilities of the yCrash Buddy.

How yCrash Buddy Can Help You Today

1. Analyze & Interpret Incidents Reports: yCrash tool today analyzes production dump/logs and generates incident reports. However, these incidents reports can be tricky to interpret and analyze for engineers who don’t have prior runtime (i.e. JVM, .NET, Python…) or application specific knowledge. Now your organization engineers can ask questions to the yCrash Buddy about the report in plain-English. yCrash Buddy would be able to interpret the report and answer specific questions asked by the engineer. yCrash Buddy can also recommend more actionable solutions to the problem. Example:

  • Why is the report mentioning that the following stack trace is causing a CPU spike? Can you explain what this stack means?

 at com.buggyapp.util.setConnectingFlight(ItinerarySegmentProcessor.java:380)

 at com.buggyapp.util.processTripType0(ItinerarySegmentProcessor.java:366)

 at com.buggyapp.util.processItineraryByTripType(ItinerarySegmentProcessor.java:254)

 at com.buggyapp.util.templateMethod(ItinerarySegmentProcessor.java:399)

 at com.buggyapp.calls.gds.InvoiceGeneratedFacade.readTicketImage(InvoiceGeneratedFacade.java:252)

  • Why are 1419 threads in TIMED_WAITING state? Is it a serious problem? 
  • How to reduce the ‘long GC Pauses’ pointed in the report?

Fig: yCrash Buddy in Incident Report Page

2. Analyze Dump Files: Engineers can upload their troubleshooting artifacts such as (Application log, GC log, Thread dump, Heap dump, netstat, vmstat, dmesg, top, …) directly to yCrash Buddy. Buddy will invoke yCrash server’s REST API to process these dumps and render back the result. 

Fig: Analyze Dumps through yCrash Buddy

3. Troubleshooting/Performance Optimization Guidance: Engineers can ask general questions about Troubleshooting and Performance Optimization to the yCrash Buddy. yCrash Buddy will leverage yCrash’s high quality knowledge base to render more meaningful and pointed solutions & recommendations. For example, you can ask the Buddy questions like: 

  • My application is suffering from poor response time, what might cause it?
  • CPU is spiking up to 100% on my k8 pod, what to do?
  • Why is our application continuously pausing? Can you please advise what to do?

What’s Next for yCrash Buddy (Roadmap)

Here are the new killer features that will be added to the yCrash Buddy. Each feature moves yCrash Buddy closer to being a fully autonomous performance and reliability assistant, one that continuously learns, acts, and optimizes your systems.

1. Autonomous Troubleshooting of Production Problems: yCrash Buddy can listen to the alerts generated by the monitoring tools and take appropriate actions. For Example, if CPU spike alert comes from a container, yCrash Buddy will perform following actions:

a. Capture the required diagnostic artifacts (Application Log, Thread Dump, Heap Dump…) from that particular container

b. Restart the container to minimize the customer impact

c. Analyze the artifacts & identify the root cause of the problem

d. Come up with precise fix/solution to the problem and share it with the engineering team

2. Intelligent Execution of Engineer Instructions: SRE Engineers can give high level instruction to yCrash Buddy, and it will start to act on their behalf. For Example, SRE Engineers can give instructions like this:

  • Check out the latest code from ‘3.01’ branch in GitHub and deploy on ABC application
  • There is a CPU spike in the ‘ABC’ container. Please attend to it.
  • Engineer ‘XYZ’ has asked for the kernel parameter ‘kernel.core_pipe_limit’ value in the ‘ABC’ container. Please send him an email with this information.

3. Autonomous Performance Testing & Optimization: yCrash Buddy can automatically conduct performance tests, based on simple instructions from the Performance Engineer like this: ‘Run Performance test against 3.01 branch and share its results’. Now yCrash Buddy will perform following tasks on behalf of the performance engineer:

a. Check out the latest code from ‘3.01’ branch in GitHub and deploy it to the Servers

b. It will use the Performance Testing Tool and run the load test

c. Test Results will be compiled & performance bottlenecks in the application will be identified 

d. Potential solutions to fix the problem will be presented to the engineering team

How to Enable yCrash Buddy in Your Enterprise?

yCrash Buddy is available in both our Cloud Edition & On-Prem Enterprise Edition. If you would like to enable yCrash Buddy on your On-Prem Enterprise Edition, then you will have to do the following:

1. Upgrade to Latest yCrash Edition: yCrash Buddy is available only in the latest Enterprise Edition, so please upgrade.

2. Purchase yCrash Buddy License: You need to purchase yCrash Buddy License. Upon purchasing new license, old license file should be replaced with the new license file and yCrash Server should be restarted

3. Connectivity to your LLM Provider: yCrash Buddy needs connectivity with major LLMs (like Gemini, OpenAI, Lama). You can use any of your enterprise approved LLM and configure its connectivity details to the yCrash server.

Want to try yCrash Buddy? Contact support@tier1app.com to activate it for your environment.

Share your Thoughts!

Up ↑

Index

Discover more from yCrash

Subscribe now to keep reading and get access to the full archive.

Continue reading