Troubleshoot Thread Starvation affecting ServiceNow MID Server

Is your ServiceNow MID Server encountering unexpected hurdles or inexplicable silence? 🤔 It might be grappling with thread starvation, silently strangling its performance. But don’t despair! In this post, we’ll provide you with the tools to simulate and liberate your MID Server from its grips. Prepare to unlock exclusive insights for troubleshooting JVM thread starvation issues, fortifying the resilience and efficiency of your applications.

Java Thread Starvation: Impact and Analysis

Java thread starvation can significantly affect the performance and behavior of an application:

  1. Decreased Throughput: Thread starvation can lead to decreased throughput as essential tasks are delayed or prevented from executing due to the lack of available resources. This can result in slower response times and degraded overall performance.
  1. Increased Response Time: Threads that are starved of resources may experience increased response times as they wait for access to shared resources or CPU time. This can lead to delays in processing user requests or executing critical tasks, impacting the user experience.
  1. Deadlocks and Livelocks: Thread starvation can contribute to deadlock or livelock scenarios, where threads are unable to make progress due to contention for resources or improper synchronization. These situations can lead to application hangs or freezes, requiring manual intervention to resolve.
  1. Resource Exhaustion: In severe cases, thread starvation can lead to resource exhaustion, such as CPU or memory exhaustion, as threads compete for limited resources without proper coordination or scheduling. This can result in system instability or crashes.
  1. Priority Inversion: Thread starvation can also lead to priority inversion, where lower-priority threads prevent higher-priority threads from executing, leading to unexpected behavior and performance degradation. This can occur when resources are not fairly distributed among threads based on their priority levels.

Simulating thread starvation in MID Server

The Java program given below simulates thread starvation on any machine/container in which it’s launched:

public class ThreadSimulationExample {

    private static final Object sharedResource = new Object();
  
    public static void main(String[] args) {

        Thread highPriorityThread1 = new HighPriorityThread();
        Thread highPriorityThread2 = new HighPriorityThread();
        highPriorityThread1.setPriority(Thread.MAX_PRIORITY);
        highPriorityThread2.setPriority(Thread.MAX_PRIORITY);
      
        
        Thread lowPriorityThread = new Thread(() -> {
            while(true){
                synchronized(sharedResource){
                    System.out.println("Low priority thread is running.");
                    
                    try {
                        Thread.sleep(10);
                    } catch (InterruptedException e) {
                        Thread.currentThread().interrupt();
                    }
                }  
           
                Thread.yield();
            }
        });
        lowPriorityThread.setPriority(Thread.MIN_PRIORITY);
      
        // Start all threads
        highPriorityThread1.start();
        highPriorityThread2.start();
        lowPriorityThread.start();
    }
  
    private static class HighPriorityThread extends Thread {
        @Override
        public void run(){
            while(true){
                synchronized(sharedResource){
                    System.out.println("High priority thread is running.");
                    performBusyWork(); // Hold the lock for a long time
                }
               
                try {
                    Thread.sleep(1);
                } catch (InterruptedException e){
                    Thread.currentThread().interrupt();
                }
            }
        }

        private void performBusyWork(){
            
            long endTime = System.currentTimeMillis() + 50; 
            while (System.currentTimeMillis() < endTime) {
            }
        }
    }
}

The ‘ThreadStarvationExample’ class is an illustration of thread starvation, where threads of lower priority have trouble completing their work because higher priority threads dominate the CPU time. The example sets up a competition between threads for a critical section of code that is protected by a shared resource lock.

To simulate this, the class defines a static ‘sharedResource’ object used for synchronization. This object acts as a mutual exclusion lock, which ensures that only one thread can execute a block of code within a synchronized section at any given time.

Within the main method, the program creates two threads with high priority and one thread with low priority. These threads will attempt to access a synchronized block that requires the ‘sharedResource’ lock.

The high-priority threads (‘highPriorityThread1’ and ‘highPriorityThread2’) are instances of ‘HighPriorityThread’, an inner class extending Thread. Upon running, they enter an infinite loop, repeatedly attempting to enter a synchronized block guarded by sharedResource. Inside this block, they perform an action that simulates a long-running task without any actual productive output (busy-waiting). Shortly after exiting the synchronized block, they sleep for just one millisecond before they try to re-enter the synchronized block, creating a tight loop of acquisition and release of the lock.

Meanwhile, the low-priority thread is also trying to access the same synchronized block. When it does manage to acquire the lock, it performs a much briefer action, simulates some work by sleeping for ten milliseconds, and then voluntarily yields its scheduled CPU time by invoking Thread.yield(). In a real-life scenario, yield() would hint the thread scheduler that the current thread is willing to allow other threads to execute, but in this code, it’s somewhat redundant as the low-priority thread’s access to the CPU is already minimal due to the aggressive behavior of the high-priority threads.

The result of this setup is the low-priority thread rarely gets a chance to run. The thread scheduler prioritizes the high-priority threads due to their higher priority setting and the fact that they are frequently ready to run with almost no down time. Even though thread priorities are only hints to the JVM and the underlying operating system, in this example, these hints are exaggerated to enforce the concept of thread starvation, where lower-priority threads may see significant delays or may almost never get a chance to execute because they are overshadowed by higher-priority threads.

Thread Starvation In ServiceNow MID Server

Now let’s try to simulate this thread deadlock in the ServiceNow MID Server environment. Let’s create a JAR (Java Archive) file from the above program by issuing below command:

jar cf ThreadStarvationSimulation.jar ThreadStarvationSimulation.class

Once a JAR file is created, let’s upload and run this program in the ServiceNow MID Server as documented in the MID Server setup guide. This guide provides a detailed walkthrough on how to run a custom Java application in the ServiceNow MID Server infrastructure. It walkthrough following steps:

  1. Creating a ServiceNow application
  2. Installing MID Server in AWS EC2 instance
  3. Configuring MID Server
  4. Installing Java application with in MID Server
  5. Running Java application from MID server

We strongly encourage you to check out the guide if you are not sure on how to run custom Java applications in ServiceNow MID server infrastructure.

yCrash’s thread starvation diagnosis in ServiceNow

yCrash is an advanced monitoring tool specifically designed to identify performance bottlenecks and offer actionable recommendations within the ServiceNow environment. ServiceNow organizations rely on yCrash extensively for diagnosing and resolving performance issues.

When encountering thread starvation scenarios on ServiceNow’s MID Server, yCrash actively monitors the micro-metrics of the environment. It swiftly detects instances of thread starvation and generates detailed reports on the dashboard, providing valuable insights into the impact on system performance. These insights enable ServiceNow administrators to take proactive measures to mitigate thread starvation and optimize system efficiency.

Total Threads count view
Fig: Total Threads count view

Understanding the total thread count in your system can provide insights into potential thread starvation. By monitoring the thread count, you can see how many threads are active at a given time. If the count is consistently high, this could indicate that threads may be competing for resources, potentially causing lower-priority threads to be starved. Keep in mind that having a high number of threads doesn’t necessarily mean your application will run faster; beyond a certain point, it could lead to more context switching, greater overhead, and reduced overall throughput.

yCrash tool reported the transitive dependency graph of BLOCKED threads, for the above thread starvation simulation:

Thread starvation view
Fig: Thread starvation view

Graph shows that ‘Thread-0’ is blocking ‘Thread-1’ and ‘Thread-2’. When you click on those thread names in the graph, you can see those threads stack traces.

Thread-0 high priority thread holding the lock
Fig: Thread-0 high priority thread holding the lock

‘Thread-0’ stack trace reveals a thread actively holding a lock while performing CPU-intensive work, which may cause thread starvation by not allowing other threads access to critical resources or CPU time, particularly if the other threads have lower priorities.

BLOCKED Thread-1 and it’s stacktrace
Fig: BLOCKED Thread-1 and it’s stacktrace

The stack trace for ‘Thread-1’ shows that it is in a BLOCKED state, waiting for an object monitor to enter the synchronized block or method. This indicates that it’s attempting to acquire a lock that ‘Thread-0’ currently holds. The presence of this ‘BLOCKED’ status can be a sign of thread starvation, particularly if this thread remains blocked for an extended period while another thread holds the lock and does not relinquish control, preventing this thread from progressing with its work.

BLOCKED Thread-2’s stacktrace
Fig: BLOCKED Thread-2’s stacktrace

The stack trace of ‘Thread-2’ indicates it’s in a BLOCKED state, similar to ‘Thread-1’. It’s waiting to acquire the same lock on an object indicated by the address <0x00001000000dfcd0> that is held by ‘Thread-0’. This repeated BLOCKED status, particularly on the same object monitor, suggests that multiple threads are competing for the same resource. If these threads continue to be blocked because one thread is holding the lock without timely release, this situation could lead to thread starvation, where “Thread-2” and possibly others are unable to make progress executing their respective tasks.

To see the complete yCrash report you can for this simulation, you may click here.

Conclusion

In summary, yCrash’s analysis provides insights into various aspects of thread-related issues within the application. These insights include identifying total thread counts, detecting threads with identical stack traces, and highlighting instances of blocking threads. Understanding these patterns and occurrences aids in diagnosing and addressing thread-related performance issues, ultimately leading to improved system efficiency and responsiveness. Additionally, the integration with ServiceNow streamlines incident reporting and resolution processes, enhancing overall IT service management efficiency. This comprehensive approach to thread analysis contributes to a more resilient and efficient IT environment, particularly in large applications integrated with ServiceNow.  If you want to diagnose performance problems in your ServiceNow deployment using yCrash you may register here.

Share your Thoughts!

Up ↑

Index

Discover more from yCrash

Subscribe now to keep reading and get access to the full archive.

Continue reading