Graphical Thread Dumps 10
I am surprised by the high number of Java developers I meet that do not know what a Java Thread Dump is or how to generate one. I find it a very powerful tool, and it is always available as part of the JVM. I haven’t played much with Java 5 yet, but it comes with jstack, a new tool that makes it easier to generate thread dumps.
Earlier this year, I was working on a load test for for a well-known airline. We were tunning the environment all we could, monitoring and profiling to know where to focus our optimization efforts. The solution involved a fairly high stack: Apache httpd, WebSphere, FuegoBPM, Tibco messaging, Oracle RAC.
The system was holding load pretty well up to a certain point in which it immediatly halted and stopped processing new requests. Every time we run the load testing scripts we experienced the same symptoms. Not even the official testers –with allegedly powerful testing and monitoring tools– were able to identify the cause of the problem.
So, I decided to get a few Thread Dumps of WebSphere’s JVM. On Unix, you do ”kill -3 <pid>” and the dump goes to WebSphere’s native_stdout.log. We
inspected the dumps but couldn’t identify dead-locks or any other
obvious anomaly, although the answer was right before our eyes.
Since the thread dump was daunting, I decided to spend a bit of my time on doing some fun work: Iwrote a short Ruby script to create a graphical representation of thedump, showing the locks each thread was holding, and the locks eachthread was waiting on. The heavy work of actually drawing and layingout the graph was left to GraphViz’s dot.Once the script was usable, I generated the graph for the abovementioned dump. To our delight, the graph immediately exposed theproblem. See for yourself.Every thread was waiting on a lock that was held by thread_Servlet.Engine.Transports 1405. What was this thread doing?Here’s the stack, taken from the thread dump: … “Servlet.Engine.Transports : 1405” daemon prio=5 tid=0x020cee40 nid=0x515ea runnable [8648f000..864919c0] at java.io.UnixFileSystem.getBooleanAttributes0(Native Method) at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:221) at java.io.File.exists(File.java:680) at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:887) at sun.misc.URLClassPath.getResource(URLClassPath.java:157) at java.net.URLClassLoader$1.run(URLClassLoader.java:191) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:187) at com.ibm.ws.bootstrap.ExtClassLoader.findClass(ExtClassLoader.java:79) at java.lang.ClassLoader.loadClass(ClassLoader.java:289) - locked <0xb4c1be08> (a com.ibm.ws.bootstrap.ExtClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:235) …accessing the filesystem! After sharing the findings with the rest ofthe team, one of the admins revealed that the WebSphere installationwas on a mounted NFS drive… WebSphere’s JVM was trying to reloadsome .jar files, but it choked NFS under heavy load.Unfortunately, moving WebSphere to a non-NFS was not trivial (youknow… a big company, with procedures, standards, bureaucracy), andsince the managers were already happy with the results of the loadtest, we never had a chance to run it all again. I am still wonderinghow much load it would have standed without NFS (andwhy was WebSphere reading those .jars so often to begin with?).So, we found the bottle-neck thanks to a thread dump, and a picture is worth a thousand lines of thread dumps_ :-)Here is the quick and dirty script: tdg.rb
Trackbacks
Use the following link to trackback from your own site:
http://neuroning.com/trackbacks?article_id=7
-
Excellent Stuff F..!
-
Hi! This post is a "little" old ... i know. Anyway helped me a lot. Thanks dude for this magic script! I did a backup at: http://gist.github.com/158178 Ok? Thanks again!
-
Really great and useful piece of code. Thought of writing a tool that did this, but then found yours. Thank you so much..
-
how to generate the graph though
-
Hi I like this idea of graphing the thread dump. I have downloaded your ruby script and downloaded ruby 191. Getting an syntax error at line 38 and 40 c:/graphwiz/tdg.rb:38: syntax error, unexpected ':', expecting keyword_then or ' ;' or '\n' c:/graphwiz/tdg.rb:40: syntax error, unexpected keyword_end, expecting $end Can you please describe complete procedure to get the graph. What I understand is first run the nagtive_stdout.log against the ruby scritp to produce a *.dot file. then run this file in the graph wizard.
-
I've just found this site theas400.com, could I just borrow the technical information from the site or do I need special permission?. I am doing a school project. Abigail
-
it's hard to generate that graph..marketing company
-
have downloaded your ruby script and downloaded ruby 191 too but no syntax errors so far.
-
Yep, it's very hard to identify the cause of the problem.
-
at my end works fine, do you still get that problem?