Graphical Thread Dumps 10

Posted by F 24/11/2005 at 23h00

I am surprised by the high number of Java developers I meet that do not know what a Java Thread Dump is or how to generate one. I find it a very powerful tool, and it is always available as part of the JVM. I haven’t played much with Java 5 yet, but it comes with jstack, a new tool that makes it easier to generate thread dumps.

Earlier this year, I was working on a load test for for a well-known airline. We were tunning the environment all we could, monitoring and profiling to know where to focus our optimization efforts. The solution involved a fairly high stack: Apache httpd, WebSphere, FuegoBPM, Tibco messaging, Oracle RAC.

The system was holding load pretty well up to a certain point in which it immediatly halted and stopped processing new requests. Every time we run the load testing scripts we experienced the same symptoms. Not even the official testers –with allegedly powerful testing and monitoring tools– were able to identify the cause of the problem.

So, I decided to get a few Thread Dumps of WebSphere’s JVM. On Unix, you do ”kill -3 <pid>” and the dump goes to WebSphere’s native_stdout.log. We inspected the dumps but couldn’t identify dead-locks or any other obvious anomaly, although the answer was right before our eyes.

Since the thread dump was daunting, I decided to spend a bit of my time on doing some fun work: Iwrote a short Ruby script to create a graphical representation of thedump, showing the locks each thread was holding, and the locks eachthread was waiting on. The heavy work of actually drawing and layingout the graph was left to GraphViz’s dot.Once the script was usable, I generated the graph for the abovementioned dump. To our delight, the graph immediately exposed theproblem. See for yourself.Every thread was waiting on a lock that was held by thread_Servlet.Engine.Transports 1405. What was this thread doing?Here’s the stack, taken from the thread dump: … “Servlet.Engine.Transports : 1405” daemon prio=5 tid=0x020cee40 nid=0x515ea runnable [8648f000..864919c0] at java.io.UnixFileSystem.getBooleanAttributes0(Native Method) at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:221) at java.io.File.exists(File.java:680) at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:887) at sun.misc.URLClassPath.getResource(URLClassPath.java:157) at java.net.URLClassLoader$1.run(URLClassLoader.java:191) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:187) at com.ibm.ws.bootstrap.ExtClassLoader.findClass(ExtClassLoader.java:79) at java.lang.ClassLoader.loadClass(ClassLoader.java:289) - locked <0xb4c1be08> (a com.ibm.ws.bootstrap.ExtClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:235) …accessing the filesystem! After sharing the findings with the rest ofthe team, one of the admins revealed that the WebSphere installationwas on a mounted NFS drive… WebSphere’s JVM was trying to reloadsome .jar files, but it choked NFS under heavy load.Unfortunately, moving WebSphere to a non-NFS was not trivial (youknow… a big company, with procedures, standards, bureaucracy), andsince the managers were already happy with the results of the loadtest, we never had a chance to run it all again. I am still wonderinghow much load it would have standed without NFS (andwhy was WebSphere reading those .jars so often to begin with?).So, we found the bottle-neck thanks to a thread dump, and a picture is worth a thousand lines of thread dumps_ :-)Here is the quick and dirty script: tdg.rb

Trackbacks

Use the following link to trackback from your own site:
http://neuroning.com/trackbacks?article_id=7

  1. Wiz 18/11/2005 at 22h47
    Excellent Stuff F..!
  2. Roger Leite 29/07/2009 at 14h06
    Hi! This post is a "little" old ... i know. Anyway helped me a lot. Thanks dude for this magic script! I did a backup at: http://gist.github.com/158178 Ok? Thanks again!
  3. Angelo 04/12/2009 at 14h36
    Really great and useful piece of code. Thought of writing a tool that did this, but then found yours. Thank you so much..
  4. damanpant@gmail.com 09/02/2010 at 12h13
    how to generate the graph though
  5. Asha 30/07/2010 at 09h52
    Hi I like this idea of graphing the thread dump. I have downloaded your ruby script and downloaded ruby 191. Getting an syntax error at line 38 and 40 c:/graphwiz/tdg.rb:38: syntax error, unexpected ':', expecting keyword_then or ' ;' or '\n' c:/graphwiz/tdg.rb:40: syntax error, unexpected keyword_end, expecting $end Can you please describe complete procedure to get the graph. What I understand is first run the nagtive_stdout.log against the ruby scritp to produce a *.dot file. then run this file in the graph wizard.
  6. AbigailHI 21/09/2010 at 21h24
    I've just found this site theas400.com, could I just borrow the technical information from the site or do I need special permission?. I am doing a school project. Abigail
  7. craig 21/10/2010 at 18h46
    it's hard to generate that graph..marketing company
  8. watches 25/10/2010 at 10h28
    have downloaded your ruby script and downloaded ruby 191 too but no syntax errors so far.
  9. diecast cars 26/10/2010 at 06h55
    Yep, it's very hard to identify the cause of the problem.
  10. supplements 17/11/2010 at 17h48
    at my end works fine, do you still get that problem?