Variable Severity Logging 1

Posted by F Sun, 24 Sep 2006 23:21:00 GMT

Most software products and logging frameworks produce entries in their log files with different severity levels (say, from Debug to Severe). Then, you can configure which level of messages you want the application to log.

At development time, you may want all messages to be logged. But once in production, you normally configure things so that only Warning and higher messages get logged (for performance, space and other reasons).

The problem is that when a Severe message arises you may need more detailed (Debug level) information in order to understand the actual cause of the problem. So, a common practice is to increase the logging level at that point, and wait for the problem to happen again –not very effective.

This is one of the problems that motivated the guys behind LogBag. The idea they propose is nice and simple:

Make the system write only Warning (and higher) messages to the log, but when such a message is logged, it should also include some lower-severity messages that occurred right before and after this one.

It’s one of those good ideas that might seem obvious, but for some reason nobody thought (or did anything) about before.

To implement this, the logging system could keep a buffer of the last N messages generated by the application, and when a Warning (or above) message comes in, the whole buffer is flushed to disk. This should not be too hard to implement, since most logging frameworks already use some kind of buffering, in a producer-consumer pattern, to improve concurrency and performance.

From the LogBag site:

You keep on putting things in a Log bag. Then whether the entire log bag gets written or not depends on the highest severity level in the entire bag. So either you write everything or you write nothing.

I think it’s a nice idea. I bet the technical support guys would like it too :-).

Graphical Thread Dumps 5

Posted by F Fri, 25 Nov 2005 04:00:00 GMT

I am surprised by the high number of Java developers I meet that do not know what a Java Thread Dump is or how to generate one. I find it a very powerful tool, and it is always available as part of the JVM. I haven’t played much with Java 5 yet, but it comes with jstack, a new tool that makes it easier to generate thread dumps.

Earlier this year, I was working on a load test for for a well-known airline. We were tunning the environment all we could, monitoring and profiling to know where to focus our optimization efforts. The solution involved a fairly high stack: Apache httpd, WebSphere, FuegoBPM, Tibco messaging, Oracle RAC.

The system was holding load pretty well up to a certain point in which it immediatly halted and stopped processing new requests. Every time we run the load testing scripts we experienced the same symptoms. Not even the official testers –with allegedly powerful testing and monitoring tools– were able to identify the cause of the problem.

So, I decided to get a few Thread Dumps of WebSphere’s JVM. On Unix, you do ”kill -3 <pid>” and the dump goes to WebSphere’s native_stdout.log. We inspected the dumps but couldn’t identify dead-locks or any other obvious anomaly, although the answer was right before our eyes.

Since the thread dump was daunting, I decided to spend a bit of my time on doing some fun work: I wrote a short Ruby script to create a graphical representation of the dump, showing the locks each thread was holding, and the locks each thread was waiting on. The heavy work of actually drawing and laying out the graph was left to GraphViz’s dot.

Once the script was usable, I generated the graph for the above mentioned dump. To our delight, the graph immediately exposed the problem. See for yourself.

Every thread was waiting on a lock that was held by thread Servlet.Engine.Transports 1405. What was this thread doing? Here’s the stack, taken from the thread dump:

...
"Servlet.Engine.Transports : 1405" daemon prio=5 tid=0x020cee40
                      nid=0x515ea runnable [8648f000..864919c0]
  at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
  at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:221)
  at java.io.File.exists(File.java:680)
  at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:887)
  at sun.misc.URLClassPath.getResource(URLClassPath.java:157)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:191)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:187)
  at com.ibm.ws.bootstrap.ExtClassLoader.findClass(ExtClassLoader.java:79)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:289)
  - locked <0xb4c1be08> (a com.ibm.ws.bootstrap.ExtClassLoader)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:235)
...

accessing the filesystem! After sharing the findings with the rest of the team, one of the admins revealed that the WebSphere installation was on a mounted NFS drive… WebSphere’s JVM was trying to reload some .jar files, but it choked NFS under heavy load.

Unfortunately, moving WebSphere to a non-NFS was not trivial (you know… a big company, with procedures, standards, bureaucracy), and since the managers were already happy with the results of the load test, we never had a chance to run it all again. I am still wondering how much load it would have standed without NFS (and why was WebSphere reading those .jars so often to begin with?).

So, we found the bottle-neck thanks to a thread dump, and a picture is worth a thousand lines of thread dumps :-)

Here is the quick and dirty script: tdg.rb

Micro and Blind Optimizations 1

Posted by F Fri, 04 Nov 2005 03:16:00 GMT

Yesterday, a good friend of mine and ex-coworker contacted to me to share his frustration.

(he hates to be called “Polino”, so I won’t.. doh!)

He finished a software solution for a customer, and now an expert is reviewing his Java code.

The expert code reviewer insists on small performance optimizations, but he is way off target. He wants to micro-optimize, and to do it blindly.

For example, he reported that the following code was doing “inefficient String concatenations”:

String myString = "Some text here "+
                  "Some text there "+
                  "Some more... ";

And that this was an “inefficient way of creating Longs”:

 myList.add(new Long(1));

These examples are probably well optimized by modern Java compilers. But even if they weren’t, they probably don’t affect much to the performance of the system as a whole.

This expert may ignore that the Pareto principle often applies to software optimization problems: about 80% of the resources (including execution time) are consumed by about 20% of the code.

You need to find those sections of code that have the most impact in the overall performance of the system.

One could argue that the two examples above are not in that top list (unless the code is wrapped by several nested loops). But, in any case, you cannot base your optimization efforts only on arguments and intuition: you need real data.

Tools exist to find that 20% of the code: Profilers. A good profiler can present very useful information about memory consumption, execution times of each function/method, blocking threads, deadlocks, and more.

There are tons of profilers for the Java platform. And many are open-source, so you have no excuse: see here and there.

So, don’t do blind optimizations, and leave micro-optimizations for last. Collect real data and then concentrate your efforts where it really matters.

And, even more important, define upfront what metrics you will use to measure performance, and agree on which results will be acceptable for the system. Otherwise data, you will never know when to stop optimizing.

Related:

“Premature optimization is the root of all evil” – CarHoare (inventor of the QuickSort algorithm)