Exporting the stuck thread count from WebLogic
Published on: Author: Mark Otting Category: OracleThis blog expands on an article by Frank Munz on stuck thread handling – I advise reading his post first, especially if you are having trouble dealing with stuck threads. Find it here.
What prompted me to write this bit was a question I was asked by one of our customers: “So stuck threads are a signal that some backend system is misbehaving. How can we monitor the number of stuck threads in WebLogic server?”.
Searching online, the first few pages in Google come up with administrators asking the same question. Some solutions offered:
- The WLDF can trigger on stuck threads, so you can have it send you a mail or an SNMP trap, but it’s very limited in functionality.
- Stuck threads can be hogging threads as well and in WLST the total count of hogging threads is an attribute of a server runtime. That’s easily found and manipulated in a WLST script. But that’s not a trustworthy indicator!
- Use jstack, then count the number of “[STUCK]” instances.
Only that last one can be used to build historic data, but even then it triggers on any stuck thread, you have to do a lot of string parsing to count only threads you are interested in.
Strangely enough nobody offered the Oracle Enterprise Manager. OEM can monitor stuck threads and saves historic data. But if OEM has a Stuck Threads metric that means there’s also an MBean. And it is, but it takes some digging through the runtime MBean tree. Browsing with WLST, every deployment gets a workmanager and if no specific workmanager is added, a copy of the default work manager configuration is used. And that in turn has a StuckThreadCount attribute. Here’s a script that counts all of these per server:
''' So the basic tree stucture we're walking looks like this: ServerRuntimes |--< Server (all servers) |--< Application run times (all deployments) |--< Work manager runtimes (all work managers) We could probably enumerate all threads and see which are stuck, but this way you can much easier manager the output. As you can see in this tree for example, you know which deployment has stuck threads ''' ## Prevent printing output to the screen redirect('/dev/null','false') ## Insert your own password here connect("weblogic", <your password>, "t3://localhost:7001") domainRuntime() servers = ls('/ServerRuntimes','true','c') # We'll store all results in here, using the server name for a key result=dict() for server in servers: deployments = ls('/ServerRuntimes/' + server + '/ApplicationRuntimes','true','c') result[server] = 0; for deployment in deployments: ## If you are only interested in a single deployment, run that check here, like ## if(deployment.getName() == "MyApplication"): ## Could be that there are multiple workmanagers, I'm not sure, so let's iterate over them wms = ls('/ServerRuntimes/' + server + '/ApplicationRuntimes/' \ + deployment + '/WorkManagerRuntimes','true','c') for wm in wms: cd('/ServerRuntimes/' + server + '/ApplicationRuntimes/' \ + deployment + '/WorkManagerRuntimes/' + wm) result[server] = result[server] + get('StuckThreadCount') ## Reenable printing output redirect('/dev/null','true') ## Print all server names and the number of stuck threads we counted per server ## Format for Nagios output etc. from here for key in result: print(key + " has " + str(result[key]) + " stuck threads.")
Using the StuckThreadForFree application from Frank Munz’ page to generate some stuck threads, I can now count them:
[oracle@machine]$ wlst getstuckthreads.wlst Backoffice-0 has 0 stuck threads. Backoffice-1 has 3 stuck threads. Selfservice has 0 stuck threads. AdminServer has 0 stuck threads.
It works! But the code is a hassle. From WebLogic 12c and upwards, WebLogic exposes an aggregate stuck thread attribute. If I change my script to use this new attribute, that makes for much cleaner code:
## Prevent printing output to the screen redirect('/dev/null','false') ## Insert your own password here connect("weblogic", <your password>, "t3://localhost:7001") domainRuntime() servers = ls('/ServerRuntimes','true','c') result=dict() for server in servers: cd('/ServerRuntimes/' + server + '/ThreadPoolRuntime/ThreadPoolRuntime') result[server] = get('StuckThreadCount') ## Reenable printing output redirect('/dev/null','true') for key in result: print(key + " has " + str(result[key]) + " stuck threads.")
There, that should do it. Of course, you can adapt these scripts in many ways, for example to save the results and inpect the behaviour of those threads over time.
Hi Mark,
Is there a way to count Stuck Thread statistics in WebLogic 11g? Is there a code snippet available that can be used to do that?
Thanks