Java VM: reproducible SIGSEGV on both 1.6.0_17 and 1.6.0_18, how to report?

SyntaxT3rr0r picture SyntaxT3rr0r · Feb 19, 2010 · Viewed 23.7k times · Source

EDIT: This reproducible SIGSEGV happens on a Linux machine with more than one proc and more than 2GB of mem, so Java is defaulting to the -server mode. Interestingly enough if I force "-client" there's no crash anymore... (I'm still not too sure what to do with my reproducible SIGSEGV but it's interesting nonetheless).

First note that this is a bit related but not identical to the following because in our case it's only a SIGSEGV that happens, and we can reliably trigger it:

JVM OutOfMemory error "death spiral" (not memory leak)

It's related because it happens when we feed our app with a "deluge of data": data are coming from text files and then number-crunched (yes, financial number crunching in Java).

I can reliably trigger a JVM to SIGSEGV using only valid Java code.

NOTE: I can invariably crash both JVM 1.6.0_17 adn JVM 1.6.0_18 and this question is not about how to workaround this issue (for example playing with VM parameters may fix the issue but I'm not after that, I want to know what to do with this always-reproducable SIGSEGV).

I've got a workaround which simply consists in using Java 1.5 when launching our app (while still using Java 1.6 to run IntelliJ IDEA, etc. on the same machine, simultaneously), but my question is if this should be reported or not and, if it should, how to report it knowing that the log itself contains proprietary information (the full hs_err_..._log).

Hardware error can be ruled out for:

  • this is happening on a workstation that regularly reaches months of uptime (I only reboot it when critical security patches affecting my trimmed down and hardened Debian Linux are issued, which really doesn't happen often) and on which applications never crash (making it very unlikely that it's an hardware issue on that machine [more below])

  • same application works perfectly on that same machine under a JVM 1.5 under the same load (this is how I'm testing the app: I simply launch it under a 1.5 VM)

  • same application works perfectly fine on more than one hundreds clients machine under the same (gigantic) load (never crashed once on Windows + JVM 1.5 or 1.6 and never crashed once on OS X + JVM 1.5 or 1.6 [a crash would mean an instant phone call from the client])

  • other application on that same machine and same 1.6.0_17 or 1.6.0_18 JVM never crash (for example I've got two instances of IntelliJ IDEA running as two different users on that same machine and they don't crash)

  • machine is tested with memtest "regularly" (before installing a new OS, which last happened when I installed Debian Lenny, not that long ago)

Here's the reproducible-on-demand SIGSEGV:

... $uname -a
Linux saturn 2.6.26-2-686 #1 SMP Wed Nov 4 20:45:37 UTC 2009 i686 GNU/Linux
... $ export /home/wizard/jdk1.6.0_17/bin:$PATH
... $ java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)

Launch the app, feed it a "deluge of data", wait a few seconds...

Then, invariably, for 1.6.0_17:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xb76d0080, pid=30793, tid=2514328464
#
# JRE version: 6.0_17-b04
# Java VM: Java HotSpot(TM) Server VM (14.3-b01 mixed mode linux-x86 )
# Problematic frame:
# V  [libjvm.so+0x4bc080]
#
# An error report file with more information is saved as:
# /home/wizard/hs_err_pid30793.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp

(note that the line '[libjvm.so+0x4bc080]' is consistent for 1.6.0_17 at every SIGSEGV)

or for 1.6.0_18:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xb77468f0, pid=722, tid=2514516880
#
# JRE version: 6.0_18-b07
# Java VM: Java HotSpot(TM) Server VM (16.0-b13 mixed mode linux-x86 )
# Problematic frame:
# V  [libjvm.so+0x4d88f0]
#
# An error report file with more information is saved as:
# /home/wizard/hs_err_pid722.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#
Aborted

(note that the line "[libjvm.so+0x4d88f0]" is consistent for 1.6.0_18 at every SIGSEGV)

The problem is that the log file contains proprietary information that cannot be shared.

Reproducing a "tiny test case" that reproduce the issue ain't realistic either: it's similar to the issue linked above, this only happens when a "deluge of data" is feeded to the app.

Note that the exact same application, on exactly the same hardware, with exactly the same JVM but another version of Linux (I had Debian Etch previously) did NOT trigger that SIGSEGV once.

But this doesn't mean the JVM isn't at fault: it could still be a JVM issue.

Should I report this and how? (keeping in mind that writing a "reproducible tiny test case" is delusional and that the log contains proprietary information that shouldn't be leaked). Should I just edit the log and send it?

What's the procedure to report such reproducible SIGSEGV when your log contains proprietary information and when a test case reproducing the issue ain't realistically doable?

Did any of you have success opening such a bug and then see it solved in a subsequent Java release?

Do you think it's good "for the Java community" to report such an issue or I just shouldn't bother because it's not important?

Answer

glenti picture glenti · Feb 25, 2010

I got similar problem upgrading to JDK 1.6_18 and it seems solved using the following options:

-server
-Xms256m
-Xmx748m
-XX:MaxPermSize=128m

-verbose:gc
-XX:+PrintGCTimeStamps
-Xloggc:/tmp/gc.log
-XX:+PrintHeapAtGC
-XX:+PrintGCDetails
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="/tmp"

-XX:+UseParallelGC
-XX:-UseGCOverheadLimit

# Following options just to remote monitoring with jconsole, useful to see JVM behaviour at runtime
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=12345
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=MyHost

I still didn't double check (it is a production environment), but I think the error was due to two reasons:

1) Wrong setting about heap and/or Permanent space (I think JDK 1.6 needs more space in heap and permanent than previous JVM versions) caused an OutOfMemoryError, but

2) in the wrong original setting somebody wrote

-XX:+HeapDumpOnOutOfMemoryError="/tmp"

and not

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="/tmp"

so probably JVM was not able to write the heapdump and we got SIGSEGV only (previous versions wrote heap dump in the working directory).

Check -server -XX:+UseParallelGC -XX:-UseGCOverheadLimit options too. I think playing with VM parameters is not a workaround, but the right approach also because garbage collector (and not only) changed between 1.5 and 1.6.