The most basic need for almost every 24×7 production environment to keep on monitoring the Server Health and the activities of Threads. And the very common need is to collect the Thread Dump as soon as [STUCK] Thread occurs in any of the WebLogic Server. Most of the time we do post-mortem analysis , Means when the Stuck Thread issues occurred or WebLogic Server hanged we could not collect the Thread Dumps to investigate. The Support always has this excuse.
While being in Middleware Support many times we face the same problem that the customers could not collect the thread dumps during the issue was occurring, which always delayed the resolution time and to find out the root cause of the actual issue.
To avoid above kind of issues we developed an Automatic WLST Script which has the following features in it:
Features Of This Script:
- Ready To Use: The Script is ready to use, it means you need not to edit anything in the WLST script except the email address in line-32.
- Flexibility: You need to just change the values present in “domains.properties” file like how many Thread Dumps you want to collect when the issue occurs.
- E-Mail Alert: The Administrator will get to know regarding the issue via an E-Mail alert immediately.
- Thread Dumps In Mail: The complete Thread Dumps will we sent to the Administrator via the E-Mail so need not to worry about collecting the Thread Dumps.
- Independent Script: This WLST script can run independently without the help of any Cron-Job utility provided by the operating System (But it can be associated with the Cron-Job utility as well) So it provides more flexibility to the Administrators.
Steps to Create an Email Alert For Stuck Threads With Thread Dumps
Step1) Create a Directory somewhere in your file system like : “C:WLST”
Step2) Write a Properties file “domains.properties” inside “C:WLST” like following:
server.url=t3://localhost:8004 admin.username=weblogic admin.password=weblogic monitoring.server.name=MS-3 # This ExecuteThread_Vs_HoggerThreadRatio represtents the division of ExecuteThread/HoggerThreadRatio ExecuteThread_Vs_HoggerThreadRatio=2 # Number of times the RATIO has to be checked checkTimes_Number=3 # TIME INTERVAL between number of times the RATIO has to be checked (60000 milliseconds = 60 seconds) checkInterval_in_Milliseconds=60000 # Number of times the Thread Dump has to be taken threadDumpTimes_Number=5 # TIME INTERVAL between each thread number has to be taken (10000 milliseconds = 10 seconds) threadDumpInterval_in_Milliseconds=10000 # Number of times to send thread dumps in mail is case of stuck/hogging thread issue sendEmail_ThreadDump_Counter=2
Step-3) Create a WLST Script somewhere in your file system with some name like “Alert_StuckThread_ThreadDumps.py” inside “C:WLST” contents will be something like following:
############################################################################# # # @author Copyright (c) 2010 - 2011 by Middleware Magic, All Rights Reserved. # ############################################################################# from java.io import FileInputStream import java.lang import os propInputStream = FileInputStream("domains.properties") configProps = Properties() configProps.load(propInputStream) adminUrl = configProps.get("server.url") adminUser = configProps.get("admin.username") adminPassword = configProps.get("admin.password") monitoringServerName = configProps.get("monitoring.server.name") executeThread_Vs_HoggerThreadRatio = configProps.get("ExecuteThread_Vs_HoggerThreadRatio") checkTimes_Number = configProps.get("checkTimes_Number") checkInterval_in_Milliseconds = configProps.get("checkInterval_in_Milliseconds") threadDumpTimes_Number = configProps.get("threadDumpTimes_Number") threadDumpInterval_in_Milliseconds = configProps.get("threadDumpInterval_in_Milliseconds") sendEmail_ThreadDump_Counter = configProps.get("sendEmail_ThreadDump_Counter") i = 0 y = int(checkTimes_Number) ############# This method would send the Alert Email with Thread Dump ################# def sendMailThreadDump(): os.system('/bin/mailx -s "ALERT: CHECK Thread Dumps as Hogger Thread Count Exceeded the Limt !!! " abcd@company.com < All_ThreadDump.txt') print '********* ALERT MAIL HAS BEEN SENT ***********' print '' ############# This method is checking the Hogger Threads Ratio ################# def alertHoggerThreads(executeTTC , hoggerTC): print 'Execute Threads : ', executeTTC print 'Hogger Thread Count : ', hoggerTC print 'executeThread_Vs_HoggerThreadRatio :', executeThread_Vs_HoggerThreadRatio if hoggerTC != 0: ratio=(executeTTC/hoggerTC) print 'Ratio : ' , ratio print '' if (int(ratio) <= int(executeThread_Vs_HoggerThreadRatio)): print ' !!!! ALERT !!!! Stuck Threads are on its way.....' print '' message = 'ExecuteThreads Count= ' + str(executeTTC) + ' HoggingThreads= '+ str(hoggerTC) +' ExecuteThreads/HoggingThreads Ratio= '+ str(ratio) cmd = "echo " + message +" > rw_file" os.system(cmd) genrateThreadDump() else: print '++++++++++++++++++++++++++++++++++++' print 'Everything is working fine till now' print '++++++++++++++++++++++++++++++++++++' else: print '++++++++++++++++++++++++++++++++++++' print 'Everything is working fine till now' print '++++++++++++++++++++++++++++++++++++' ############# This method is Taking the Thread Dumps ################# def genrateThreadDump(): b = int(sendEmail_ThreadDump_Counter) a = 0 p = 0 q = int(threadDumpTimes_Number) serverConfig() cd ('Servers/'+ monitoringServerName) while (p < q): if a < b: print 'Taking Thread Dump : ', p threadDump() cmd = "cat Thread_Dump_MS-3.txt >> All_ThreadDump.txt" os.system(cmd) print 'Thread Dump Collected : ', p ,' now Sleeping for ', int(threadDumpInterval_in_Milliseconds) , ' Seconds ...' print '' Thread.sleep(int(checkInterval_in_Milliseconds)) b = b - 1 p = p + 1 sendMailThreadDump() cmd = "rm -f All_ThreadDump.txt" os.system(cmd) serverRuntime() connect(adminUser,adminPassword,adminUrl) serverRuntime() cd('ThreadPoolRuntime/ThreadPoolRuntime') while (i < y): executeTTC=cmo.getExecuteThreadTotalCount(); hoggerTC=cmo.getHoggingThreadCount(); alertHoggerThreads(executeTTC , hoggerTC) print 'Sleeping for ', int(checkInterval_in_Milliseconds) , ' ...' print '' Thread.sleep(int(checkInterval_in_Milliseconds)) i = i + 1
Step-4) Open a command prompt and then run the “setWLSEnv.cmd” or “setWLSEnv.sh” to set the CLASSPATH and PATH variables. Better you do echo %CLASSPATH% or echo $CLASSPATH to see whether the CLASSPATH is set properly or not. If you see an Empty Classpath even after running the “setWLSEnv.sh” then please refer to the Note mentioned at Step3) in the Following post: http://middlewaremagic.com/weblogic/?page_id=1492
Step-5) Now run the WLST Script in the same command prompt using the following command:
java weblogic.WLST Alert_StuckThread_ThreadDumps.py
You will see the following kind of results in the command prompt
$ java weblogic.WLST Alert_StuckThread_ThreadDumps.py Initializing WebLogic Scripting Tool (WLST) ... Welcome to WebLogic Server Administration Scripting Shell Type help() for help on available commands Connecting to t3://localhost:8004 with userid weblogic ... Successfully connected to managed Server 'MS-3' that belongs to domain 'Domain_8001'. Warning: An insecure protocol was used to connect to the server. To ensure on-the-wire security, the SSL port or Admin port should be used instead. Location changed to serverRuntime tree. This is a read-only tree with ServerRuntimeMBean as the root. For more help, use help(serverRuntime) Execute Threads : 5 Hogger Thread Count : 2 executeThread_Vs_HoggerThreadRatio : 2 Ratio : 2 !!!! ALERT !!!! Stuck Threads are on its way..... Taking Thread Dump : 0 Thread dump for the running server: MS-3 "[STANDBY] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock weblogic.work.ExecuteThread@1e2ba602 WAITING java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:157) weblogic.work.ExecuteThread.run(ExecuteThread.java:178) "[STANDBY] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock weblogic.work.ExecuteThread@439fdcc7 WAITING java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:157) weblogic.work.ExecuteThread.run(ExecuteThread.java:178) "DynamicListenThread[Default[2]]" RUNNABLE native java.net.PlainSocketImpl.socketAccept(Native Method) java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390) java.net.ServerSocket.implAccept(ServerSocket.java:453) java.net.ServerSocket.accept(ServerSocket.java:421) weblogic.socket.WeblogicServerSocket.accept(WeblogicServerSocket.java:38) weblogic.server.channels.DynamicListenThread$SocketAccepter.accept(DynamicListenThread.java:523) weblogic.server.channels.DynamicListenThread$SocketAccepter.access$200(DynamicListenThread.java:415) weblogic.server.channels.DynamicListenThread.run(DynamicListenThread.java:166) java.lang.Thread.run(Thread.java:619) "[STUCK] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'" RUNNABLE sun.misc.FloatingDecimal.doubleValue(FloatingDecimal.java:1531) java.lang.Double.parseDouble(Double.java:510) jsp_servlet.__index._jspService(__index.java:71) weblogic.servlet.jsp.JspBase.service(JspBase.java:34) weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227) weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125) weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292) weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:175) weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3498) weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) weblogic.security.service.SecurityManager.runAs(Unknown Source) weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180) weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086) weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406) weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
NOTE: This script is using mailx (i.e. but Windows box does not have mailx utility) so please do check if your mailx is configured properly or else script would run properly but the mail would not be sent.
.
Regards,
Ravish Mody
February 24th, 2011 on 7:12 pm
Hi Ravish,
Below error I am getting while executing this script.
Warning: An insecure protocol was used to connect to the
server. To ensure on-the-wire security, the SSL port or
Admin port should be used instead.
Location changed to serverRuntime tree. This is a read-only tree with ServerRunt
imeMBean as the root.
For more help, use help(serverRuntime)
Execute Threads : 6
Hogger Thread Count : 0
executeThread_Vs_HoggerThreadRatio : 2
Problem invoking WLST – Traceback (innermost last):
File “D:ResearchWLST_For_TreadDumpsAlert_StuckThread_ThreadDumps.py”, line
85, in ?
File “D:ResearchWLST_For_TreadDumpsAlert_StuckThread_ThreadDumps.py”, line
41, in alertHoggerThreads
ZeroDivisionError: integer division or modulo
Please let me know what was causing this error ?
thanks,
Praveen
February 24th, 2011 on 7:53 pm
Hi Praveen,
The issue which you had faced was due to the Hogger Thread Count was equal to “0” ( which we had not checked ) and we are using that hogger thread count (hoggerTC) to get the ratio (i.e. ratio=(executeTTC/hoggerTC) ) to check if the script has to send an mail or not and as the denominator is equal to zero that’s why you are getting the “ZeroDivisionError: integer division or modulo” error.
In short you found a bug in our script 😉 , which has been fixed now.
Thank you Praveen to help us by finding a bug in above script.
We Share 20 Bonus Magic Points with you for suggesting the great enhancement. Thank you once again for your keen observation and sharing enhancements.
Keep Sharing… 🙂
December 3rd, 2015 on 8:19 am
Hi Ravish,
While excution of script we are facing the issue wit h getting t3 twice.
TD_intervalMillis 1000 xxxxx, xxxxxx
surl ‘t3://190.43.106.28:7001′
Connecting to t3://’t3://190.43.106.28:7001’ with userid xxxxx …
Traceback (innermost last):
File “”, line 1, in ?
main
File “”, line 22, in connect
File “”, line 648, in raiseWLSTException
WLSTException: Error occured while performing connect : The url specified is malformed. Please correct it.
Use dumpStack() to view the full stacktrace
December 3rd, 2015 on 8:28 am
Hello Venki,
It looks like you might be appending another “t3://” protocol in the “adminUrl”. So can you please print the values of the variables in your WLST script before executing the following line:
connect(adminUser,adminPassword,adminUrl)
Regards
Jay SenSharma
February 26th, 2011 on 1:11 pm
Ravish,
Rather than depending on the ration why we are not directly counting the stuck threads. With JMX we can use something like this.
sourcecode language=”java” wraplines=”false”
ThreadPoolRuntimeMBean threadPoolRuntimeMBean = serverRuntimeBean.getThreadPoolRuntime();
Integer stuckThreadCount = 0;
ExecuteThread executeThreadArray[] = threadPoolRuntimeMBean.getExecuteThreads();
if(executeThreadArray != null){
boolean isStuck = false;
for(ExecuteThread executeThread :executeThreadArray){
if(executeThread.isStuck()){
stuckThreadCount++;
Date currentRequestStartTime = new Date(executeThread.getCurrentRequestStartTime());
logger.error(“Name : ” + executeThread.getName() + “, CurrentRequest : ” + executeThread.getCurrentRequest()
+ “, CurrentRequestStartTime : ” + currentRequestStartTime +
“, ApplicationName : ” + executeThread.getApplicationName());
isStuck = true;
}
}
if(isStuck){
JVMRuntimeMBean jvmRuntimeMBean = serverRuntimeBean.getJVMRuntime();
logger.error(jvmRuntimeMBean.getThreadStackDump());
}
}
/sourcecode
I have pasted the code snippet that i use for monitoring purpose in my tool. I have not tested will the same logic works with “py”. I will do and update.
Thanks
Sumit
February 26th, 2011 on 6:24 pm
Hi Sumit,
Thank you for sharing the JMX code snippet to find out the Stuck Threads. I will try to implement it.
The WLST Script we posted in the above article provides us the details proactively. Like before getting actual STUCK Thread scenario as well it can trigger an alarm. The only thing we need to keep in mind that we choose the Correct Ration of Hogger Threads Vs Execute Threads.
So the WLST Script will WARN us that there may be something wrong please check….
Thanks once again for sharing the JMX way of doing this. Magic Team shares 30 Bonus Magic Points with you 🙂
.
.
Keep Sharing 🙂
Thanks
Jay SenSharma
February 26th, 2011 on 7:33 pm
Jay,
This will give us pre-information only when Hogger thread is not zero. But when the threads are stuck at that time it may be possible that Hogger thread count is 0 and we may never able to collect the actual thread dump and alert.
Thanks
Sumit
February 26th, 2011 on 9:26 pm
Hi Sumit,
I have never seen HoggerThread=0 at the time of STUCK Thread situation. Can u simulate this? Let us know if you have any testcase for that?
.
.
Keep Posting 🙂
Thanks
Jay SenSharma
February 26th, 2011 on 10:01 pm
Jay,
I remember one scenario which i tried few months bak. I made the log server un reachable . So initially there were hogger threads . But after few hours all the threads were in the stuck mode and there was no hogger thread . At that time totally the WL was using around 400 threads.
But this is the scenario when where we already the above code has collected the dump. However i will think on to simulate where the HTC=0 and still have stuck thread
Thanks for the Bonus point.But it is not reflected in the total point 🙂
Thanks
Sumit
February 26th, 2011 on 11:19 pm
Hi Sumit,
Any thread cannot be declared as STUCK until it has been passed through the Hogger Thread State. Hogger Threads is actually a state. A thread is called in this Hogging state when it keep on performing the same task for a long time and These kind of threads are the Major Suspect of becoming STUCK Threads later…Means Hogger Threads becomes the STUCK Thread.
It is like a Child can not become Old without being passed through Young Age. Same Thing is here a Thread Cannot be directly declared as Stuck Thread without being in the state of Hoggers.
So the above WLST script gives you the options to be Alarmed Even before the Actual Stuck Thread situation occured.
But if you want to be alarmed only when the Threads have become STUCK then I will consider it as a “Post Mortem Analysis” .
So You can chose between
1). Getting alert before the things goes out of control.
OR
2). Getting alert after the things goes out of control
.
.
Keep Posting 🙂
Thanks
Jay SenSharma
May 12th, 2011 on 12:26 pm
Hi,
Thanks for the script. I tried to implement this it worked fine for admin servers….. this script checks hogging thread related to admin server and then it takes thread dump either admin or managed server depending on what we mention in the script. But If i want to check hogging threads related to managed server directly and then going for checking ratio and proceed further for threaddump if observed hogging threads.
Please let me know is there any way to do this. Thanks.
May 21st, 2011 on 11:52 am
Hi Jay,
You have also send a Alert Mail in Windows System for downloading the below mentioned opensource tool “sendEmail”.
http://caspian.dotconf.net/menu/Software/SendEmail/
But you know only Mail Exchange server ip address.
You have to use the same script in windows system you have to modify the 32th line in “Alert_StuckThread_ThreadDumps.py” python script.
Example:-
192.168.1.10 –> Exchange Mail server IP Address.
D:sendmail-vinothsendEmail.exe -f Weblogic.monitoring@test.com -t vinothbabu.s@test.com -s 192.168.1.10 -u Weblogic Server JDBC Stastics Report < D:sendmail-vinothAll_ThreadDump.txt
Regards,
S.vinoth babu
May 21st, 2011 on 2:57 pm
Hi S.vinoth babu,
Thank you for sharing this great information, now people with Windows box can also use the same scripts. However I was not able to test it at my end because the my Windows version is not yet supported with SendEmail.
Still is a worth trying this software on the supported versions of SendEmail, hence Magic Team would love to share 30 Magic Points with you for sharing such a great information.
Keep Posting 🙂
Regards,
Ravish Mody
May 30th, 2011 on 4:26 am
Hello Ravish,Jay
The above code snippet, Will monitor the Admin server for Hogging Thread and If there are any Hogging Thread present, we are taking a TD of the Managed server and forwarding it on mail….
What if admin Server is good and Hogging Thread are observed on Managed server?
Thanks
Sathya
May 30th, 2011 on 6:19 am
Hello Ravish,Jay..
In free time, Have made few changes to the Above code to monitor all Nodes in a Domain
Script
===================================
from java.io import FileInputStream
import java.lang
import os
import smtplib
propInputStream = FileInputStream(“/home/cgnr3dev/scripts/domain.properties”)
configProps = Properties()
configProps.load(propInputStream)
adminUrl = configProps.get(“admin.url”)
adminUser = configProps.get(“admin.username”)
adminPassword = configProps.get(“admin.password”)
monitoringServerName = configProps.get(“monitoring.server.name”)
executeThread_Vs_HoggerThreadRatio = configProps.get(“ExecuteThread_Vs_HoggerThreadRatio”)
checkTimes_Number = configProps.get(“checkTimes_Number”)
checkInterval_in_Milliseconds = configProps.get(“checkInterval_in_Milliseconds”)
threadDumpTimes_Number = configProps.get(“threadDumpTimes_Number”)
threadDumpInterval_in_Milliseconds = configProps.get(“threadDumpInterval_in_Milliseconds”)
sendEmail_ThreadDump_Counter = configProps.get(“sendEmail_ThreadDump_Counter”)
totalServersToMonitor = configProps.get(“total.number.of.servers”)
y = int(checkTimes_Number)
############# This method is checking the Hogger Threads Ratio #################
def alertHoggerThreads(executeTTC , hoggerTC, serverName):
print ‘Execute Threads : ‘, executeTTC
print ‘Hogger Thread Count : ‘, hoggerTC
print ‘executeThread_Vs_HoggerThreadRatio :’, executeThread_Vs_HoggerThreadRatio
if hoggerTC != 0:
ratio=(executeTTC/hoggerTC)
print ‘Ratio : ‘ , ratio
print ”
if (int(ratio) rw_file”
os.system(cmd)
genrateThreadDump(serverName)
else:
print ‘++++++++++++++++++++++++++++++++++++’
print ‘Everything is working fine till now’
print ‘++++++++++++++++++++++++++++++++++++’
else:
print ‘++++++++++++++++++++++++++++++++++++’
print ‘Everything is working fine till now’
print ‘++++++++++++++++++++++++++++++++++++’
############# This method is Taking the Thread Dumps #################
def genrateThreadDump(serverName):
b = int(sendEmail_ThreadDump_Counter)
a = 0
p = 0
q = int(threadDumpTimes_Number)
serverConfig()
cd (‘Servers/’+ serverName)
while (p < q):
if a > All_ThreadDump.txt”
os.system(cmd)
print ‘Thread Dump Collected : ‘, p ,’ now Sleeping for ‘, int(threadDumpInterval_in_Milliseconds) , ‘ Seconds …’
print ”
Thread.sleep(int(checkInterval_in_Milliseconds))
b = b – 1
p = p + 1
cmd = “rm -f All_ThreadDump.txt”
os.system(cmd)
serverRuntime()
#connect(adminUser,adminPassword,adminUrl)
#serverRuntime()
#cd(‘ThreadPoolRuntime/ThreadPoolRuntime’)
totalServers = int(totalServersToMonitor)
i=1
while (i > serverState_file”
#os.system(cmd)
i = i + 1
=======================================
domain.properties
admin.username=CNETPREAdmin
admin.password=XXXXXX
admin.url=t3s://169.185.246.55:31071
ExecuteThread_Vs_HoggerThreadRatio=2
total.number.of.servers=3
checkTimes_Number=25
sendEmail_ThreadDump_Counter=2
threadDumpTimes_Number=5
checkInterval_in_Milliseconds=10000
threadDumpInterval_in_Milliseconds=10000
server.name.1=CNETPREServer
server.url.1=t3s://169.185.246.55:31071
host.name.1=cgnrapp1d
server.name.2=CNETPRENode1
server.url.2=t3://169.185.246.55:31074
host.name.2=cgnrapp1d
server.name.3=CNETPRENode2
server.url.3=t3://169.185.246.55:31078
host.name.3=cgnrapp1d
check.interval=0
connectionPool.alert.limit=12
==============================================
Thanks
Sathya
September 20th, 2011 on 10:11 am
Hi Ravish,
I’m getting the following error. Can you please comment?
Error
——
===== END OF THREAD DUMP ===============
The Thread Dump for server jmsFarmManagedServer1
has been successfully written to Thread_Dump_jmsFarmManagedServer1.txt
cat: Thread_Dump_jmsManagedServer1.txt: No such file or directory
Thread Dump Collected : 1 now Sleeping for 2000 Seconds …
Null message body; hope that’s ok
********* ALERT MAIL HAS BEEN SENT ***********
Sleeping for 2000 …
At the end when i look at “Thread_Dump_jmsFarmManagedServer1.txt”, it contains only one thread dump…
Any suggestions as to where i’m missing….
September 20th, 2011 on 10:17 am
To be precise, it is not able to cat the “Thread_Dump_jmsFarmManagedServer1.txt” saying “No such file or directory” and because of which it cannot write to “All_ThreadDump.txt”.
At the end there is no content in “All_ThreadDump.txt”, so it says “Null message body; hope that’s ok”…
September 20th, 2011 on 10:37 am
Hi Laskshmi Srikanth,
It looks like you have given a wrong file name to be “cat” from which can be seen from the below output
======================
has been successfully written to Thread_Dump_jmsFarmManagedServer1.txt
cat: Thread_Dump_jmsManagedServer1.txt: No such file or directory
======================
Thread-dump is been written in “Thread_Dump_jmsFarmManagedServer1.txt” and you are trying to cat “Thread_Dump_jmsManagedServer1.txt” file which is different (i.e. missing “Farm” in the file-name) hence it throws ” No such file or directory” error. Just fix this with giving the correct name and it should work just fine.
Regards,
Ravish Mody
September 20th, 2011 on 11:25 am
Good catch Ravish..
I fixed that part but still it creates only one thread dump in “Thread_Dump_jmsFarmManagedServer1.txt” and doesn’t even mail the thread dump.
In the above logic we have “All_ThreadDump.txt” which doesn’t even get created, there by no mail is being sent out.
September 20th, 2011 on 11:34 am
Below given is the output.
===== END OF THREAD DUMP ===============
The Thread Dump for server jmsFarmManagedServer1
has been successfully written to Thread_Dump_jmsFarmManagedServer1.txt
********* ALERT MAIL HAS BEEN SENT ***********
Thread Dump Collected : 1 now Sleeping for 2000 Seconds …
Sleeping for 2000 …
September 20th, 2011 on 3:55 pm
Hi Lakshmi Srikanth,
I believe the all the thread dumps must be getting created but thing I think your “mailx” might not be working. Regarding the “All_ThreadDump.txt” not getting generated we delete the file to save the disk space once we collect the given number of thread dumps. Hence to see if all the thread dumps are getting generated you can comment out the line 81 and 82 in the above code, this way we can make out where is the exact problem.
Regards,
Ravish Mody
September 20th, 2011 on 8:17 pm
Yes, when i commented out those lines it did work. But still the mail doesn’t go out.
When i do just use mailx command from my shell it works but not from python command prompt. It gives exit status as zero but no mail.
September 20th, 2011 on 8:55 pm
Hi Lakshmi Srikanth,
To narrow down the issue answer the below questions
– On which OS are you running this WLST script?
– Have you made any changes to the above script? If yes, then send us the script and the properties files on contact@middlewaremaigc.com
– Are you running this script with the same permission by which just the mailx command worked from the shell prompt?
– Is the Email-ID given correctly ?
Regards,
Ravish Mody
September 20th, 2011 on 11:55 pm
It worked Ravish.. there was a typo in the e-mail address..
Thanks for your help…
You guys are Excellent…
Good site for WebLogic Admins…
January 26th, 2012 on 6:18 am
Hi,
We get the following error.
Can any one help.
-bash-3.2$ java weblogic.WLST Alert_StuckThread_ThreadDumps.py
Initializing WebLogic Scripting Tool (WLST) …
Welcome to WebLogic Server Administration Scripting Shell
Type help() for help on available commands
Problem invoking WLST – Traceback (innermost last):
(no code object) at line 0
File “/eas/home/oracle/sbatra/Alert_StuckThread_ThreadDumps.py”, line 89
b = b – 1
^
SyntaxError: inconsistent dedent
Regards,
Moshe.S
January 28th, 2012 on 5:05 pm
Hi Moshe,
In the WLST Script control statements the Alignment (indentation) is most important. Indentation means the Spaces before the Statement. Because the “if” block and the “for” loop starts and ends based on the indentation. So make sure that the statement b = b – 1 has the right indentation. Means make sure that the following two statements are present inside the if a<b : block
b = b – 1
p = p + 1
.
.
Keep Posting 🙂
Thanks
Jay SenSharma
January 27th, 2012 on 4:59 pm
Hi,
Being a WLST newbie… Can I ask a stupid question?
The domains.properties script is for connecting to the Admin Server or just one of the Managed Servers?
I have two production domains each containing two managed servers and an admin server. Would I have to right a script for each managed servers?
Thanks,
January 28th, 2012 on 4:55 pm
Hi muffin_top,
You can write only a single WLST Script to monitor all your Domains and Managed Servers using a For Loop inside the WLST Script and by mentioning the following kind of entries inside your domains.properties file:
server.url.1=t3://domainOneManagedServerHost:8004
admin.username.1=weblogic
admin.password.1=weblogic
monitoring.server.name.1=ServerOne
server.url.2=t3://domainTwoManagedServerHost:8005
admin.username.2=weblogic
admin.password.2=weblogic
monitoring.server.name.2=ServerTwo
server.url.3=t3://domain_ThreeManagedServerHost:8006
admin.username.3=weblogic
admin.password.3=weblogic
monitoring.server.name.3=ServerThree
server.url.4=t3://domain_FourManagedServerHost:8007
admin.username.4=weblogic
admin.password.4=weblogic
monitoring.server.name.4=ServerFour
The for loop example you can see in the following link: http://middlewaremagic.com/weblogic/?p=4956
.
.
Keep Posting 🙂
Thanks
Jay SenSharma
February 7th, 2012 on 6:52 pm
Hi,
What’s the best way to have this script running all the time?
Would you have a cronjob running the command – java weblogic.WLST Alert_StuckThread_ThreadDumps.py?
Thanks,
February 8th, 2012 on 2:43 am
Hi muffin_top,
It is not a good practice to run the WLST script continuously, because it will consume a lot of server resources so running the WLST Script using a Cron Job in a feasible interval is a good practice.
.
.
Keep Posting 🙂
Thanks
Jay SenSharma
February 8th, 2012 on 6:15 pm
Thanks for the info on using the For loop… Just need to get head around it now. Will post back with results… in a few weeks :-)!
February 9th, 2012 on 2:33 pm
Hi,
While using a for loop to iterate through the three server instances in the domain, the script fails on this line:
cmd = “cat Thread_Dump_MS-3.txt >> All_ThreadDump.txt”
It’s trying to cat the file ‘Thread_Dump_MS-3.txt’ that does not exist. How do I pass in the servername to the cat command, i.e. “cat Thread_Dump_servername”. I can use the variable ‘monitoringServerName’ but cannot get it to work.
Thanks,
February 9th, 2012 on 5:24 pm
Hi muffin_top,
To create a file name with respect to your server name you can try replacing the line
with
Regards,
Ravish Mody
February 9th, 2012 on 6:12 pm
Yep, just managed to work that out before seeing your post.
I have the script working great now. It checks the admin server and both managed servers in turn using a while loop. Had trouble using a for loop.
Now, my problem is that on some managed servers I can have in excess of 20 Execute Threads which causes the script to generate a thread dump when not needed. is this to do with with the weblogic tuning settings?
I can post script and config file if it will help anyone?
Thanks,
March 14th, 2012 on 12:36 pm
Hi Jay / Ravish ,
How are the thread states NEW , RUNNABLE , BLOCKED/WAITING FOR MONITOR ENTRY , WAITING , TIMED WAITING , SLEEPING /WAITING ON CONDITION & TERMINATED .
different from the STANDBY , STUCK [EXECUTE THREAD ] .
A thread state can only be NEW , RUNNABLE , BLOCKED/WAITING FOR MONITOR ENTRY , WAITING , TIMED WAITING , SLEEPING /WAITING ON CONDITION & TERMINATED . If so then what are these STANDBY , STUCK [EXECUTE THREAD ] called ? Please clarify ?
-Kiran
April 10th, 2012 on 6:18 pm
Hi,
I still cannot work out how to not generate an alert when I have a high number of execute threads.
For instance, one of my managed servers has 17 TTC and 4 HTC. This would generate an alert using this script but I know this is not going to cause a stuck thread.
Help!
Thanks,
Craig
November 27th, 2012 on 5:49 pm
Hi,
Is it possible to adapt de script to get de STUCK threads of a managed server? The script only works with threads in the AdminServer.
November 28th, 2012 on 1:55 pm
Probably by connecting to the managed server instead of the admin server.
December 12th, 2012 on 12:18 am
I enhanced the main function of code using the original code posted by Ravish.
this will connect to the admin server get the list of managed servers and for each instance on that domain including the admin server collects the details of Threadpool runtime. and the rest is same compare the ratio and send email accordingly
Please let me know if you have any issues
from java.io import FileInputStream
import java.lang
import os
propInputStream = FileInputStream(“domains.properties”)
configProps = Properties()
configProps.load(propInputStream)
adminUrl = configProps.get(“server.url”)
adminUser = configProps.get(“admin.username”)
adminPassword = configProps.get(“admin.password”)
#monitoringServerName = configProps.get(“monitoring.server.name”)
executeThread_Vs_HoggerThreadRatio = configProps.get(“ExecuteThread_Vs_HoggerThreadRatio”)
checkTimes_Number = configProps.get(“checkTimes_Number”)
checkInterval_in_Milliseconds = configProps.get(“checkInterval_in_Milliseconds”)
threadDumpTimes_Number = configProps.get(“threadDumpTimes_Number”)
threadDumpInterval_in_Milliseconds = configProps.get(“threadDumpInterval_in_Milliseconds”)
sendEmail_ThreadDump_Counter = configProps.get(“sendEmail_ThreadDump_Counter”)
i = 0
y = int(checkTimes_Number)
############# This method would send the Alert Email with Thread Dump #################
def sendMailThreadDump():
os.system(‘usr/bin/mailx -s “ALERT: CHECK Thread Dumps as Hogger Thread Count Exceeded the Limt !!! ” me@mail.com < All_ThreadDump.txt')
print '********* ALERT MAIL HAS BEEN SENT ***********'
print ''
############# This method is checking the Hogger Threads Ratio #################
def alertHoggerThreads(executeTTC , hoggerTC):
print 'Execute Threads : ', executeTTC
print 'Hogger Thread Count : ', hoggerTC
print 'executeThread_Vs_HoggerThreadRatio :', executeThread_Vs_HoggerThreadRatio
if hoggerTC != 0:
ratio=(executeTTC/hoggerTC)
print 'Ratio : ' , ratio
print ''
if (int(ratio) rw_file”
os.system(cmd)
genrateThreadDump()
else:
print ‘++++++++++++++++++++++++++++++++++++’
print ‘Everything is working fine till now’
print ‘++++++++++++++++++++++++++++++++++++’
else:
print ‘++++++++++++++++++++++++++++++++++++’
print ‘Everything is working fine till now’
print ‘++++++++++++++++++++++++++++++++++++’
############# This method is Taking the Thread Dumps #################
def genrateThreadDump():
b = int(sendEmail_ThreadDump_Counter)
a = 0
p = 0
q = int(threadDumpTimes_Number)
serverConfig()
print monitoringServerName
cd (‘Servers/’+ monitoringServerName)
while (p < q):
if a > All_ThreadDump.txt”
os.system(cmd)
print ‘Thread Dump Collected : ‘, p ,’ now Sleeping for ‘, int(threadDumpInterval_in_Milliseconds) , ‘ Seconds …’
print ”
Thread.sleep(int(checkInterval_in_Milliseconds))
b = b – 1
p = p + 1
sendMailThreadDump()
cmd = “rm -f All_ThreadDump.txt”
os.system(cmd)
serverRuntime()
connect(adminUser,adminPassword,adminUrl)
servers = cmo.getServers()
domainRuntime()
for monitorserver in servers:
i = 0
monitoringServerName=monitorserver.getName()
cd(‘/ServerRuntimes/’ + monitoringServerName)
serverName=cmo.getName();
#print serverName
cd(‘ThreadPoolRuntime/ThreadPoolRuntime’)
while (i < 3):
print monitoringServerName
CompleteTRC=cmo.getCompletedRequestCount();
ExecuteThreadIdleCount=cmo.getExecuteThreadIdleCount();
print 'ExecuteThreadIdleCount:', ExecuteThreadIdleCount
print 'CompletedRequestCount:', CompleteTRC
executeTTC=cmo.getExecuteThreadTotalCount();
hoggerTC=cmo.getHoggingThreadCount();
alertHoggerThreads(executeTTC , hoggerTC)
print 'Sleeping for ', int(checkInterval_in_Milliseconds) , ' …'
print ''
Thread.sleep(int(checkInterval_in_Milliseconds))
i = i + 1
cd('../../../..')
December 14th, 2012 on 10:20 pm
Ravish,
Can you please explain based on what you took the Execute thread count Vs Hogging thread count as 2?
February 23rd, 2016 on 12:20 am
Hi All,
As per the below scripts i am able to connect Admin server….but i have to connect to manager server…pls correct if i made any mistake in the below scripts.
Python script:-
#############################################################################
#
# @author Copyright (c) 2010 – 2011 by Middleware Magic, All Rights Reserved.
#
#############################################################################
from java.io import FileInputStream
import java.lang
import os
propInputStream = FileInputStream(“domains.properties”)
configProps = Properties()
configProps.load(propInputStream)
adminUrl = configProps.get(“server.url”)
adminUser = configProps.get(“admin.username”)
adminPassword = configProps.get(“admin.password”)
monitoringServerName = configProps.get(“monitoring.server.name”)
executeThread_Vs_HoggerThreadRatio = configProps.get(“ExecuteThread_Vs_HoggerThreadRatio”)
checkTimes_Number = configProps.get(“checkTimes_Number”)
checkInterval_in_Milliseconds = configProps.get(“checkInterval_in_Milliseconds”)
threadDumpTimes_Number = configProps.get(“threadDumpTimes_Number”)
threadDumpInterval_in_Milliseconds = configProps.get(“threadDumpInterval_in_Milliseconds”)
sendEmail_ThreadDump_Counter = configProps.get(“sendEmail_ThreadDump_Counter”)
i = 0
y = int(checkTimes_Number)
############# This method would send the Alert Email with Thread Dump #################
def sendMailThreadDump():
os.system(‘/bin/mailx -s “ALERT: CHECK Thread Dumps as Hogger Thread Count Exceeded the Limt !!! ” abcd@company.com < All_ThreadDump.txt')
print '********* ALERT MAIL HAS BEEN SENT ***********'
print ''
############# This method is checking the Hogger Threads Ratio #################
def alertHoggerThreads(executeTTC , hoggerTC):
print 'Execute Threads : ', executeTTC
print 'Hogger Thread Count : ', hoggerTC
print 'executeThread_Vs_HoggerThreadRatio :', executeThread_Vs_HoggerThreadRatio
if hoggerTC != 0:
ratio=(executeTTC/hoggerTC)
print 'Ratio : ' , ratio
print ''
if (int(ratio) rw_file”
os.system(cmd)
genrateThreadDump()
else:
print ‘++++++++++++++++++++++++++++++++++++’
print ‘Everything is working fine till now’
print ‘++++++++++++++++++++++++++++++++++++’
else:
print ‘++++++++++++++++++++++++++++++++++++’
print ‘Everything is working fine till now’
print ‘++++++++++++++++++++++++++++++++++++’
############# This method is Taking the Thread Dumps #################
def genrateThreadDump():
b = int(sendEmail_ThreadDump_Counter)
a = 0
p = 0
q = int(threadDumpTimes_Number)
serverConfig()
cd (‘Servers/’+ monitoringServerName)
while (p < q):
if a > All_ThreadDump.txt”
os.system(cmd)
print ‘Thread Dump Collected : ‘, p ,’ now Sleeping for ‘, int(threadDumpInterval_in_Milliseconds) , ‘ Seconds …’
print ”
Thread.sleep(int(checkInterval_in_Milliseconds))
b = b – 1
p = p + 1
sendMailThreadDump()
cmd = “rm -f All_ThreadDump.txt”
os.system(cmd)
serverRuntime()
connect(adminUser,adminPassword,adminUrl)
serverRuntime()
cd(‘ThreadPoolRuntime/ThreadPoolRuntime’)
while (i < y):
executeTTC=cmo.getExecuteThreadTotalCount();
hoggerTC=cmo.getHoggingThreadCount();
alertHoggerThreads(executeTTC , hoggerTC)
print 'Sleeping for ', int(checkInterval_in_Milliseconds) , ' …'
print ''
Thread.sleep(int(checkInterval_in_Milliseconds))
i = i + 1
domain.properties
server.url=t3://ussumstsoaapp04:7004
admin.username=weblogic
admin.password=webAdm1n
monitoring.server.name=osbapp2_sitms2
# This ExecuteThread_Vs_HoggerThreadRatio represtents the division of ExecuteThread/HoggerThreadRatio
ExecuteThread_Vs_HoggerThreadRatio=2
# Number of times the RATIO has to be checked
checkTimes_Number=3
# TIME INTERVAL between number of times the RATIO has to be checked (60000 milliseconds = 60 seconds)
checkInterval_in_Milliseconds=60000
# Number of times the Thread Dump has to be taken
threadDumpTimes_Number=3
# TIME INTERVAL between each thread number has to be taken (10000 milliseconds = 10 seconds)
threadDumpInterval_in_Milliseconds=10000
# Number of times to send thread dumps in mail is case of stuck/hogging thread issue
sendEmail_ThreadDump_Counter=2
February 23rd, 2016 on 1:42 am
[celapp@appmanprod TDA]$ ./TDA.sh
rm: cannot remove `/opt/temp/TDA/TDA.txt’: No such file or directory
Initializing WebLogic Scripting Tool (WLST) …
Welcome to WebLogic Server Administration Scripting Shell
Type help() for help on available commands
Connecting to t3://ussumstsoaapp04:7003 with userid weblogic …
Successfully connected to managed Server ‘osbapp1_sitms1’ that belongs to domain ‘sitosb_domain’.
Warning: An insecure protocol was used to connect to the
server. To ensure on-the-wire security, the SSL port or
Admin port should be used instead.
Location changed to serverRuntime tree. This is a read-only tree with ServerRuntimeMBean as the root.
For more help, use help(serverRuntime)
Execute Threads : 63
Hogger Thread Count : 38
executeThread_Vs_HoggerThreadRatio : 2
Ratio : 1
!!!! ALERT !!!! Stuck Threads are on its way…..
Taking Thread Dump : 0
Problem invoking WLST – Traceback (innermost last):
File “/opt/temp/TDA/TDA.py”, line 92, in ?
File “/opt/temp/TDA/TDA.py”, line 51, in alertHoggerThreads
File “/opt/temp/TDA/TDA.py”, line 72, in genrateThreadDump
File “”, line 800, in threadDump
java.lang.NullPointerException
at java.io.DataOutputStream.writeBytes(DataOutputStream.java:257)
at weblogic.management.scripting.InformationHandler.threadDump(InformationHandler.java:1508)
at weblogic.management.scripting.WLScriptContext.threadDump(WLScriptContext.java:607)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.python.core.PyReflectedFunction.__call__(PyReflectedFunction.java:160)
at org.python.core.PyMethod.__call__(PyMethod.java:96)
at org.python.core.PyObject.__call__(PyObject.java:248)
at org.python.core.PyObject.invoke(PyObject.java:2016)
at org.python.pycode._pyx19.threadDump$53(:800)
at org.python.pycode._pyx19.call_function()
at org.python.core.PyTableCode.call(PyTableCode.java:208)
at org.python.core.PyTableCode.call(PyTableCode.java:404)
at org.python.core.PyTableCode.call(PyTableCode.java:253)
at org.python.core.PyFunction.__call__(PyFunction.java:169)
at org.python.pycode._pyx18.genrateThreadDump$3(/opt/temp/TDA/TDA.py:72)
at org.python.pycode._pyx18.call_function(/opt/temp/TDA/TDA.py)
at org.python.core.PyTableCode.call(PyTableCode.java:208)
at org.python.core.PyTableCode.call(PyTableCode.java:256)
at org.python.core.PyFunction.__call__(PyFunction.java:169)
at org.python.pycode._pyx18.alertHoggerThreads$2(/opt/temp/TDA/TDA.py:51)
at org.python.pycode._pyx18.call_function(/opt/temp/TDA/TDA.py)
at org.python.core.PyTableCode.call(PyTableCode.java:208)
at org.python.core.PyTableCode.call(PyTableCode.java:279)
at org.python.core.PyFunction.__call__(PyFunction.java:175)
at org.python.pycode._pyx18.f$0(/opt/temp/TDA/TDA.py:92)
at org.python.pycode._pyx18.call_function(/opt/temp/TDA/TDA.py)
at org.python.core.PyTableCode.call(PyTableCode.java:208)
at org.python.core.PyCode.call(PyCode.java:14)
at org.python.core.Py.runCode(Py.java:1135)
at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:167)
at weblogic.management.scripting.WLST.main(WLST.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at weblogic.WLST.main(WLST.java:29)
java.lang.NullPointerException: java.lang.NullPointerException
February 23rd, 2016 on 12:09 pm
Hello nukala1239,
Looks like in your line 92 of “TDA.py” you have not specified the “domain.properties” file/path. Or may be the “domain.properties” file does not exist in the same directory where the “TDA.py” script exist.
Better hardcode the “domain.properties” file name with absolute path.
Example:
/opt/temp/TDA/domain.properties
Regards
Jay SenSharma
February 23rd, 2016 on 11:27 pm
Dear Jay,
Thanks for Response.I tried with above but still getting the same issue…i am able to connect to weblogic URL and capturing the counts….but i am getting an error when i am trying to download the thread dump.
Kindly do the needful
February 23rd, 2016 on 11:39 pm
Hello nukala1239,
Please check if you have write permission for the file “/opt/temp/TDA/TDA.txt” as i see the following error which might lead to the NullPointerException that you are getting:
./TDA.sh
rm: cannot remove `/opt/temp/TDA/TDA.txt’: No such file or directory
Regards
Jay SenSharma
February 23rd, 2016 on 11:50 pm
Hello nukala1239,
The “NullPointerException” is coming while writing some bytes to the file… so you need to make sure that the file in which the thread dump is going to be written has enough permission for writing:
++++++++++++++++++
File “”, line 800, in threadDump
java.lang.NullPointerException
at java.io.DataOutputStream.writeBytes(DataOutputStream.java:257)
at weblogic.management.scripting.InformationHandler.threadDump(InformationHandler.java:1508)
++++++++++++++++++
Regards
Jay SenSharma
February 24th, 2016 on 12:16 am
Dear Jay,
Thanks for your response..I have given enough permissions to TDA.txt file where threaddump is going to write using CHMOD.but still getting the same issue.Kindly do the needful
March 4th, 2016 on 7:02 am
I am planning to use this code in Windows environment. mailx not working, so what other options I have to use this using windows to send alert email. Any suggestions?
March 4th, 2016 on 11:05 am
Hello Pgasri,
I am not a windows expert, but i guess some alternatives like “bmail” can be used in Windows. Bmail is a free but lean command line SMTP mail sender.
http://retired.beyondlogic.org/solutions/cmdlinemail/cmdlinemail.htm
I have practically never used this tool “Bmail” but by reading the article it looks like it should work. Sorry from my side for not being much helpful on this.
Regards
Jay SenSharma
February 1st, 2017 on 11:24 pm
I find this extremely helpful. I am now facing struck tread issues tied to a specific oacore_server. When running the shell script it seems to only display for the admin server. Even if I put the managed server name in the monitoring.server.name field. Should I be able to see the stuck threads with this script on individual managed servers?
February 2nd, 2017 on 12:52 am
Hi Jgabay,
The article is little bit old. We tested it on older version of WLS (i guess WLS 10/11). Can you please let us know which version of WLS are you using?
I think that at the WLST tree level there might not be much difference (for “/Servers/” path) so it should be working fine in later version of WLS as well (even for individual servers).
Can you check at your end if you changing the value of “monitoring.server.name=MS-3” property in the “domains.properties” file correctly?
Or may be you can try changing the “Alert_StuckThread_ThreadDumps.py” script directly with a hardcoded value to see if it works:
Also can you manually try checking the following in your WLST command prompt manually to see if it is working or not?
Please replace the “YOUR_SERVER_NAME” with your own server name.
Regards
Jay SenSharma
May 31st, 2018 on 7:24 pm
We are not getting alert
Location changed to serverRuntime tree. This is a read-only tree with ServerRuntimeMBean as the root.
For more help, use help(serverRuntime)
Execute Threads : 22
Hogger Thread Count : 0
executeThread_Vs_HoggerThreadRatio : 2
++++++++++++++++++++++++++++++++++++
Everything is working fine till now
++++++++++++++++++++++++++++++++++++
Sleeping for 7000 …
Execute Threads : 22
Hogger Thread Count : 0
executeThread_Vs_HoggerThreadRatio : 2
++++++++++++++++++++++++++++++++++++
Everything is working fine till now
++++++++++++++++++++++++++++++++++++
Sleeping for 7000 …
abcsvl10_test1
June 7th, 2018 on 6:55 pm
Hi Ravish,
Is it possible to capture Username who is causing the Stuck threads and send mail notifications with usernames
Thanks,
Rakesh S