Chaos Engineering - Simulating CPU spike

Page Contents

In this series of chaos engineering articles, let’s discuss how to simulate CPU consumption to spike up to 100% on a host (or container). CPU consumption will spike up whenever a thread goes on an infinite loop. Here is a sample program from the open-source BuggyApp application, which would cause the CPU to spike up.

public class CPUSpikeDemo {

  public static void start() {
    new CPUSpikerThread().start();
    new CPUSpikerThread().start();
    new CPUSpikerThread().start();
    new CPUSpikerThread().start();
    new CPUSpikerThread().start();
    new CPUSpikerThread().start();
    System.out.println("6 threads launched!");
  }
}

public class CPUSpikerThread extends Thread {

  @Override
  public void run() {
		
    while (true) {
			
      // Just looping infinitely
    }
  }
}

In the above Java program, you will notice the ‘CPUSpikeDemo’ class. In this class, 6 threads with the name ‘CPUSpikerThread’ are launched. If you notice the ‘CPUSpikerThread’ class code, there is a ‘while (true)’ loop without any code in it. This condition will cause the thread to go on an infinite loop. Since 6 threads are executing this code, all the 6 threads will go on an infinite loop. When this program is executed, CPU consumption will skyrocket on the machine.

We launched the above BuggyApp program on a ‘t3a.medium’ EC2 instance, which has 2 CPUs. Below is the output from the UNIX performance monitoring tool ‘top’. You can notice the overall CPU % reaching out to 100%.

Fig: Top tool showing CPU consumption spiking up to 100%

How to diagnose CPU spike?

As highlighted in this article, you can use manual approach to do root cause analysis:

Capture thread dump from the application
Capture ‘top -H -p {PID}’ output
Marry these #a and #b and identify the root cause of the CPU spike problem

On the other hand, you can use automated root cause analysis tool like yCrash – which automatically captures application-level data (thread dump, heap dump, Garbage Collection log), system-level data (netstat, vmstat, iostat, top, top -H, dmesg,…) and marries these two datasets to generate instant root cause analysis report instantly. Below is the report generated by the yCrash tool when the above sample program is executed:

Fig: yCrash tool point out lines of code causing the CPU spike

From the report, you can observe the yCrash is pointing out that 6 threads are causing the CPU to spike up. In the ‘CPU | Memory’ section of this report, you can notice that CPU consumption of each thread (which is > 30%) to be reported. You can also notice that tool is pointing out exact lines of code i.e., com.buggyapp.cpuspike.CPUSpikerThread.run(CPUSpikerThread.java:12) that is causing the infinite loop. Equipped with this information one can easily go ahead and fix the problematic code.

“Production is Secure. Is Production Troubleshooting Secure?’ Webinar

Organizations focus on securing production environments but often neglect the security of troubleshooting processes. During incidents, diagnostic artifacts may contain sensitive data, risking exposure when moved across systems. A recent webinar by Ram Lakshmanan highlighted the importance of safeguarding these…

Production is Secure. Is Troubleshooting Process Secure?

Enterprises have invested heavily in cybersecurity, yet the production troubleshooting process still faces significant risks, including untrusted tools, data leakage, and unauthorized access to sensitive information. The yCrash solution addresses these gaps by securely managing troubleshooting artifacts, implementing robust authentication,…

Securing Production Troubleshooting with yCrash Audit Logs

yCrash has introduced an audit trail feature, addressing enterprise security gaps by logging user actions. This enhancement enables tracking of who accesses sensitive production data, ensuring compliance and accountability. With SSO integration, the logs provide detailed user activity, fostering a…

Add yours

Malyadri says:

August 6, 2021 at 3:35 pm

Is it your ycrash predefined code is creating CPU spike or actual application code? If it’s predefined code then it’s not going to be actual chaos simulation. Please confirm,Thanks.

1. tier1appteam says:
  
  September 22, 2021 at 2:07 pm
  
  Hello @Malyadri!
  
  This CPU spike is generated by our buggyApp code.

Chaos Engineering – Simulating CPU spike

How to diagnose CPU spike?

You may also like

2 thoughts on “Chaos Engineering – Simulating CPU spike”

Add yours

Share your Thoughts!Cancel reply

“Production is Secure. Is Production Troubleshooting Secure?’ Webinar

Production is Secure. Is Troubleshooting Process Secure?

Securing Production Troubleshooting with yCrash Audit Logs

About

Popular Topics

Troubleshooting Tools

How to diagnose CPU spike?

You may also like

2 thoughts on “Chaos Engineering – Simulating CPU spike”

Add yours

Share your Thoughts!Cancel reply

About

Popular Topics

Troubleshooting Tools

Discover more from yCrash