I am surprised by the high number of Java developers I meet that do not know what a Java Thread Dump is or how to generate one. I find it a very powerful tool, and it is always available as part of the JVM. I haven’t played much with Java 5 yet, but it comes with jstack, a new tool that makes it easier to generate thread dumps.
Earlier this year, I was working on a load test for for a well-known airline. We were tunning the environment all we could, monitoring and profiling to know where to focus our optimization efforts. The solution involved a fairly high stack: Apache httpd, WebSphere, FuegoBPM, Tibco messaging, Oracle RAC.
The system was holding load pretty well up to a certain point in which it immediatly halted and stopped processing new requests. Every time we run the load testing scripts we experienced the same symptoms. Not even the official testers –with allegedly powerful testing and monitoring tools– were able to identify the cause of the problem.
So, I decided to get a few Thread Dumps of WebSphere’s JVM. On Unix, you do ”
kill -3 <pid>” and the dump goes to WebSphere’s
inspected the dumps but couldn’t identify dead-locks or any other
obvious anomaly, although the answer was right before our eyes.
Since the thread dump was daunting, I decided to spend a bit of my time on doing some fun work: Iwrote a short Ruby script to create a graphical representation of thedump, showing the locks each thread was holding, and the locks eachthread was waiting on. The heavy work of actually drawing and layingout the graph was left to GraphViz’s
dot.Once the script was usable, I generated the graph for the abovementioned dump. To our delight, the graph immediately exposed theproblem. See for yourself.Every thread was waiting on a lock that was held by thread_Servlet.Engine.Transports 1405. What was this thread doing?Here’s the stack, taken from the thread dump: … “Servlet.Engine.Transports : 1405” daemon prio=5 tid=0x020cee40 nid=0x515ea runnable [8648f000..864919c0] at java.io.UnixFileSystem.getBooleanAttributes0(Native Method) at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:221) at java.io.File.exists(File.java:680) at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:887) at sun.misc.URLClassPath.getResource(URLClassPath.java:157) at java.net.URLClassLoader$1.run(URLClassLoader.java:191) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:187) at com.ibm.ws.bootstrap.ExtClassLoader.findClass(ExtClassLoader.java:79) at java.lang.ClassLoader.loadClass(ClassLoader.java:289) - locked <0xb4c1be08> (a com.ibm.ws.bootstrap.ExtClassLoader) at java.lang.ClassLoader.loadClass(ClassLoader.java:235) …accessing the filesystem! After sharing the findings with the rest ofthe team, one of the admins revealed that the WebSphere installationwas on a mounted NFS drive… WebSphere’s JVM was trying to reloadsome .jar files, but it choked NFS under heavy load.Unfortunately, moving WebSphere to a non-NFS was not trivial (youknow… a big company, with procedures, standards, bureaucracy), andsince the managers were already happy with the results of the loadtest, we never had a chance to run it all again. I am still wonderinghow much load it would have standed without NFS (andwhy was WebSphere reading those .jars so often to begin with?).So, we found the bottle-neck thanks to a thread dump, and a picture is worth a thousand lines of thread dumps_ :-)Here is the quick and dirty script: tdg.rb
Use the following link to trackback from your own site: