Software Development – Java – Open Source
« | »

Suns Troubleshooting Guide for Java

Monday, September 17th, 2007 at 12:31 pm

Suns Troubleshooting Guide for Java provides a nice overview of available tools to analyze and monitor Java processes. I would guess that a lot of Java developers doesn’t know them.

I recently had to identify a performance problem on one of our production servers. The application appeared to hang every few seconds and then continue normal operation. Using jstat I quickly discovered that there were a lot of objects created just to be garbage collected right after. This was the output of jstat:

S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
0,00   0,00 100,00 100,00  89,75  10546  726,162  1192 12922,944 13649,106 Allocation Failure   unknown GCCause
0,00   0,00 100,00 100,00  89,44  10546  726,162  1192 12922,944 13649,106 Allocation Failure   unknown GCCause
0,00   0,00  21,93 100,00  89,47  10546  726,162  1192 12934,808 13660,970 unknown GCCause      No GC
0,00   0,00  32,41 100,00  89,48  10546  726,162  1192 12934,808 13660,970 unknown GCCause      No GC
0,00   0,00  42,44 100,00  89,48  10546  726,162  1192 12934,808 13660,970 unknown GCCause      No GC
S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
0,00   0,00  56,79 100,00  89,48  10546  726,162  1192 12934,808 13660,970 unknown GCCause      No GC
0,00   0,00  68,61 100,00  89,48  10546  726,162  1192 12934,808 13660,970 unknown GCCause      No GC
0,00   0,00  78,02 100,00  89,48  10546  726,162  1192 12934,808 13660,970 unknown GCCause      No GC
0,00   0,00  88,74 100,00  89,48  10546  726,162  1192 12934,808 13660,970 unknown GCCause      No GC
0,00   0,00 100,00 100,00  89,48  10546  726,162  1193 12934,808 13660,970 Allocation Failure   unknown GCCause
S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
0,00   0,00 100,00 100,00  89,48  10546  726,162  1193 12934,808 13660,970 Allocation Failure   unknown GCCause
0,00   0,00 100,00 100,00  89,48  10546  726,162  1193 12934,808 13660,970 Allocation Failure   unknown GCCause
0,00   0,00 100,00 100,00  89,48  10546  726,162  1193 12934,808 13660,970 Allocation Failure   unknown GCCause
0,00   0,00 100,00 100,00  89,48  10546  726,162  1193 12934,808 13660,970 Allocation Failure   unknown GCCause
0,00   0,00 100,00 100,00  89,48  10546  726,162  1193 12934,808 13660,970 Allocation Failure   unknown GCCause
S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
0,00   0,00 100,00 100,00  89,48  10546  726,162  1193 12934,808 13660,970 Allocation Failure   unknown GCCause
0,00   0,00 100,00 100,00  89,48  10546  726,162  1193 12934,808 13660,970 Allocation Failure   unknown GCCause
0,00   0,00 100,00 100,00  89,48  10546  726,162  1193 12934,808 13660,970 Allocation Failure   unknown GCCause
0,00   0,00 100,00 100,00  89,48  10546  726,162  1193 12934,808 13660,970 Allocation Failure   unknown GCCause
0,00   0,00 100,00 100,00  89,48  10546  726,162  1193 12934,808 13660,970 Allocation Failure   unknown GCCause
S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
0,00   0,00 100,00 100,00  89,48  10546  726,162  1193 12934,808 13660,970 Allocation Failure   unknown GCCause
0,00   0,00  24,36 100,00  89,51  10546  726,162  1193 12946,347 13672,509 unknown GCCause      No GC

The “E” column denotes percent of Eden space used. In the time where this value is 100% the application doesn’t respond until the garbage collection took place.

Using jstack I was able to find out what the server was doing while inflating the heap.

All this was done on a server running in production without using any additional profilers!

Another nice feature is the -XX:+HeapDumpOnOutOfMemoryError option mentioned in section 3.3.3.4 of the guide. I always set this option for production servers. This way you always have a memory dump available to analyze, even if the support team only restarted the application without taking a heap dump.

Or take jmap for example. You can not only get some heap statistics but also generate a heap dump while the application is running.

So make sure you know about the tools available as they might become useful if your application ever gets into trouble ;)

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to “Suns Troubleshooting Guide for Java”

  1. William Louth Says:
    September 17th, 2007 at 8:53 pm

    “All this was done on a server running in production without using any additional profilers!”

    I think you will find that most respectable companies do not use profilers on production servers and instead use Java performance management solutions that are optimized to automate most of the performance monitoring and problem diagnostics activities including such adhoc investigations as above.

    What is important to bear in mind is that in any reasonable sized installation developers do not have access to servers and that operations & application management support staff are looking to build a knowledge management solution around diagnostics images whilst handling incidents to ensure problem management is actually performed and in an efficient manner.

    regards,

    William

  2. christoph Says:
    September 17th, 2007 at 9:56 pm

    Well, maybe I’m developing for too small sized installations. Do you have any recommendations for products that do automatic performance monitoring and problem diagnostics? Sounds very interesting.

  3. Hari Says:
    September 18th, 2007 at 4:15 am

    thanks, quite useful and good timing of this blog :)

  4. James Says:
    September 21st, 2007 at 2:55 am

    Try Wily (www.wilytech.com) for monitoring production JVMs. They helped write JSR-163, the spec for byte-code instrumentation (BCI). Their stuff collects a ton of metrics in production with very low overhead.

Leave a Reply