Overhead added by collecting thread dumps

A thread dump is a snapshot of all the threads running in a java process. It’s a vital artifact to troubleshoot various production problems such as CPU spikes, unresponsiveness in the application, poor response time, hung threads, high memory consumption. Thus to facilitate troubleshooting, we have seen enterprises capture thread dumps on a periodic basis (every 5 minute or 2 minute).So we were curious to learn the overhead of capturing thread dump on a periodic basis. Thus we set out to conduct the below case study.

Environment

For our study we chose to use the open source spring boot pet clinic application. Pet Clinic is a poster child application that was developed to demonstrate the spring boot framework features. 

We ran this application in OpenJDK 11. We deployed this application on the Amazon AWS t2.medium EC2 instance which has 16GB RAM and 2 CPUs. Test was orchestrated using Apache JMeter stress testing tool. We used AWS Cloudwatch to measure the CPU, Memory utilization. In nutshell here are the tools/technologies, we used to conduct this case study:

  • OpenJDK 11
  • AWS EC2
  • AWS Cloudwatch
  • Apache JMeter

Test Scenario

In this environment, we conducted 3 tests:

  1. Baseline Test – In this scenario we ran the pet clinic application without capture thread dumps using the JMeter tool for 20 minutes with 200 concurrent users
  2. Thread dumps every 5 minutes Test – In this scenario we ran the pet clinic application using the same JMeter script for 20 minutes with 200 concurrent users. However we captured thread dump from a pet clinic application every 5 minutes.
  3. Thread dumps every 2 minutes Test – In this scenario we ran the pet clinic application using the same JMeter script for 20 minutes with 200 concurrent users. However we captured thread dump from a pet clinic application every 2 minutes.

Note: If you don’t know how to capture thread dump, see How to capture thread dumps? 8 options for more details.

Test Results

We captured average CPU and memory utilization from the AWS Cloudwatch and average response time and throughput from the JMeter tool. Data collected from all the test scenarios are summarized in the below table.

Data collectedBaseline testEvery 5 minutes testEvery 2 minutes test
Avg CPU Usage8.35%10.40%7.92%
Avg Memory Usage20.80%19.90%19.60%
Avg Response Time3901 ms3888 ms3770 ms
Avg Throughput24.4/sec25.8/sec24.8/sec

As you can see there is no noticeable difference in the CPU and Memory consumption. Similarly there is no noticeable difference in the average response and transaction throughput. 

Conclusion

Thus based on our study we can conclude that there is no noticeable overhead in capturing thread dumps on a 5 minutes or 2 minutes interval.

2 thoughts on “Overhead added by collecting thread dumps

Add yours

  1. Thanks for all this, Mahesh. If I may try to add to the value of what you’re sharing, I’d suggest that this would be still more useful if only you added a couple more things.

    First, while tyou conclude how the impact is neglible *on average* over the entire 20 min test, you don’t reflect what the impact is *during the specific time the thread dumps are taking place*. Even if it may be rather modest from some observation you mafe, it would be important to confirm that with data.

    Second, you didn’t indicate the throughput of your tests. You indicate 200 users: are they making 1 request per second? In aggregate or each? Again, it would seem that such factors would be important to know to how typical would be your conclusion of “no noticeable overhead” (or impact).

    Third, following on both the above, it would be helpful if in such a study you shared your jmeter tests. That way others could both try to replicate the results AND be able to answer such questions as I’ve raised.

    Finally, it would help both readers and those who may try to replicate your results to know WHICH of the 8 approaches you used to do the thread dumps, especially in such an automated test. (And where was the dump stored? I hope not redirected to nul/bit bucket. That would skew the results unfairly.)

    Again, please don’t hear any of these as complaints. I appreciate the work, the goal, and the write up. In fact, i’m trying to help advance the cause of taking thread dumps for diagnostic purposes, and knowing the relative “cost” to that.

    Thanks.

    1. Hello Crehat!

      Thanks for your comment. I will try to answer your questions.

      #1. The “20 minute” window also includes a time when thread dump was captured. The test was continuously running for 20 minutes. If you consider the 2nd scenario, in that scenario while test was running I captured thread dump after every 5 minutes.

      #2. In this test 200 request’s were sent per second.

      #3. Good idea. But unfortunately I don’t have it right now. I will need to rerun the entire test again to publish jmeter result as well.

      #4. The link shows 8 different options to capture thread dump. You can use any option. I used option #1 to capture thread dump and you can provide a file path where you want to store thread dump file in your system.

      Thank you so much for your feedbacks. Feel free to reply in case if my answers are not clear to you or haven’t answer your question.

Leave a Reply

Powered by WordPress.com.

Up ↑

%d bloggers like this: