Hi All,

Jay SenSharma

Jay SenSharma

Thread dump analysis is the most important part to find out the Server slow responsiveness or Hang Server Situation or sometimes a Crash scenario as well if happened because of the Stuck threads. Here we are going to see some very basic but important features of  WebLogic Thread Model and the functionality and tasks performed by these Threads. Here we will talk about some very common terminology which we use while analyzing the Thread Dump or Hang Server Situation.

What is Hang Server Situation and It’s Symptoms?

1). If a server responses very slow compared to the estimated time of response then it may be moving to total Hang Server Situation.

2). If a Server does not even response to the clients requests, then it is a Complete Server Hang scenario. Some times the complete stuckness of a Server also may cause a Server Crash.

Roles & Responsibilities  Of WebLogic Threads?

The WebLogic Threads can be broadly categorized into 2 Main categories.
1). WebLogic Execute Threads: These Threads are responsible for processing the clients/users requests. This decides that how many tasks a server can be performed in parallel.  By default in development mode the WLS Server will have 15 Execute Threads and if we run the Server in Production mode then the WLS Server will have 25 Execute Threads as Default. Increasing the Execute Thread count does not mean increase in performance…rather in some cases it may even degrade the performance.
2). WebLogic Socket Reader Threads: These threads listens to the incoming clients requests, Means basically these Threads deals with the Network traffic part.  These are basically a percentage of Execute Threads. The Default ration between is Execute Threads and Socket Reader Threads is 33%. Means 33% Threads will be Socket Readers as default. Which can be tuned according to the requirement.
Example: if we have 15 Execute Threads in Development Mode of WLS Server then it simply means that around 33% of these threads will work as Socket Reader Threads and rest Threads will be processing the Clients request.

What are Execute Queues?

Execute Queue is a group of Execute Threads which will be taking care of designated Servlets / RMI Ojects/ EJBs/ JSPs…etc. Till WLS8.1 we could see these Execute Queues informations  available as part of “config.xml” file entry. But from WLS 9.x onwards as the WebLogic Threading Model is changed because of the introduction of WorkManagers, we wont see the Execute Queues details by default in WLS9.x and later WLS. But still if we want to use the WLS8.1 style of Threading model in WLS9.x or Later versions of WebLogic then we can use the  the following JAVA_OPTION :  -Dweblogic.Use81StyleExecuteQueues=true

Possible Cause of Server Hang?

There may be many reasons behind Server Slow responsiveness or Hang scenario…

Cause 1). If the Free Heap memory is very less then the the Threads will NOT be able to create required objects in the Java Heap.

Cause 2). Insufficient number of Threads. It happens some times that if the Load (number of users request) on Server suddenly increased and the MaxThreadCount is not set to a correct value then Server will not be able to process these many requests.

Cause 3). If the Garbage Collection is taking much time in that case the Garbage Data clean up process will take longer time and the Threads will be doing Garbage Collection rather than processing the clients request. Or the Threads will be waiting for some free memory to create some objects in the heap.

Cause 4). Sometimes Java Code optimization also causes a temporary hang scenario, because the code optimization is a little heavy process but useful for better performance.

Cause 5). Many Remote Jdbc Lookups can sometimes cause the Hang scenario.

Cause 6). In accurate JSP compilation settings. (recommended always precompile the JSPs before deploying it to the production environments and the  PageCheckSeconds must be set correctly sothat the JSP compilation will not happen very frequently)

Cause 7). Application code Deadlock or Jdbc Driver Deadlocks or Vendor API Bug: When threads waits in infinite loop to gain lock on objects… Example scenario below
Example:
1).  Thread_A has Gained Lock on Object Obj_A
2).  Thread_B Gained Lock on Object Obj_B
3).  Now After performing some operations on Obj_A the Thread_A  is trying to get Lock on Obj_B (Obj_B is already locked by Thread_B currently)
4).  Similarly after performing some operations on Obj_B the Thread_B wants to gain Lock on Obj_A (Obj_A is already locked by Thread_A)

In Above Scenario Now Both the Threads are waiting for each other to release the Lock on their object … but none of them is actually releasing the lock from the Objects which they Already have Locked.

Cause 8). If the Number of File Descriptors are very less (insufficient resources) in this case also we might face slow server response or Server Hang situations.

How the Server Log will Look like in case of Hang Scenarios or Slow Responsiveness?

By Default 600 Seconds (10 Minutes) is the default duration after which the WebLogic Server declares a Thread as a STUCK Thread. The Entry of Stuck Thread occurance can be seen in the Server Logs. You can see following kind of entry in the Server Logs…

<Warning> <WebLogicServer> <BEA-000337> <ExecuteThread: '7' for queue: 'weblogic.kernel.Default' has been busy for "630" seconds working on the request "weblogic.ejb20.internal.JMSMessagePoller@d64412", which is more than the configured time (StuckThreadMaxTime) of "600" seconds.>

Does the Above kind of Entry in Server Log means WebLogic is Hang?

Above message in server Log is just an indication that some Threads are taking longer time to process some requests which may lead to a Hang Server Situation and of course Slow responsiveness of WebLogic Server. But it doesn’t mean 100% that WebLogic cannot recover these Threads.

WebLogic has a capability to Declare a STUCK thread as UNSTUCK. Weblogic would priodically check for stuck threads based on settings of “Stuck Thread Max Time” and “Stuck Thread Timer Interval”  http://e-docs.bea.com/wls/docs103/ConsoleHelp/taskhelp/tuning/TuningExecuteThreads.html.

Weblogic will just report the thread as stuck and in case the thread progresses(may be it seemed to be stuck but was actually running a long transaction), weblogic would declare it as unstuck.
Many times it happens that our Application requirement says that a Thread can take more time to process the Clients request (more than 600 Seconds) In that case we can change the “StuckThreadMaxTime” interval.  Example if we know that our Application has some long running Jdbc queries which may take upto 900 Seconds then we can increase the StuckThreadMaxTime.

What First Aid Steps Required to Collect Debug Data ?

Debugging-1). Try to ping the WebLogic Server 5-6 times using “weblogic.Admin PING” utility to see how quickly we are getting the response back?

java weblogic.Admin -url t3://StuckThreadHostName:9001  -username weblogic -password weblogic  PING

Debugging-2). As soon as you get time check the Verbose GC Logs (Garbage Collection Logs if you have already enabled the Garbage Collection Logging by applying the following JAVA_OPTIONS)

set JAVA_OPTIONS=%JAVA_OPTIONS%  -Xloggc:/opt/logs/gc.log -XX:+PrintGCDetails -Xmx1024m -Xms1024m

Debugging-3). Collect at least 4-5 Thread Dumps taken in the interval of 9-10 seconds each to see the Activities of Threads. Follow the various ways of collecting Thread Dumps:  http://middlewaremagic.com/weblogic/?p=823

Debugging-4). Check If the Load (Number of Clients Request) is/was abnormally very High on the Server?

Debugging-5). Check if the JMS Subsystem connectivity or the Database connectivity is lost somewhere… You may find in the Server Logs some Strange entries like DataSource is Disables/ Network Adapter could not Establish Connection to the Database/ JMS Messaging System “PeerGoneException” ….. Tuxedo/Jolt Connectivity Errors…etc.
.

.

Thanks

Jay SenSharma

If you enjoyed this post, please considerleaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.