Runtime.exec causes duplicate JVM to hang indefinitely until killed (Solaris 10)

We are running a J2EE application on WebLogic server 9.2 MP2 with a jrockit 64-bit JVM (27.3.1) on Solaris 10.

We call use runtime.exec to call an executable called jfmerge to create PDF documents.

We have found that in Solaris, when runtime.exec is called, a duplicate JVM is temporarily spawned to kick off the jfmerge process. While this is inefficient (our JVM is 5 GB, thus the duplicated shell JVM is also 5 GB), the major problem lies in the fact that when there is heavy load on this functionality (PDF generation) in our application, sometimes the duplicated JVM never exits.

When the JVM hangs, the servers create large issues (extreme application slowness and terminated user sessions) as the entire duplicate JVM get's all of its 5 GB of process size written to disk swap.

We have noted the following hung thread correlated with a hung JVM process until the process is manually killed:

"[STUCK] ExecuteThread: '17' for queue: 'weblogic.kernel.Default (self-tuning)'" id=3463 idx=0x158 tid=3460 prio=1 alive, in native, daemon at jrockit/io/FileNativeIO.readBytesPinned(Ljava/io/FileDescriptor;[BII)I(Native Method) at jrockit/io/FileNativeIO.readBytes( at java/io/FileInputStream.readBytes([BII)I( at java/io/ at java/lang/UNIXProcess$ at java/io/BufferedInputStream.fill( at java/io/ ^-- Holding lock: java/io/BufferedInputStream@0xfffffffec6510470[thin lock] at gov/v3/common/formgeneration/sessionbean/FormsBean.getProcessStatus( at gov/v3/common/formgeneration/sessionbean/FormsBean.createPDF( at gov/v3/common/formgeneration/sessionbean/FormsBean.getTemplateDetails( at gov/v3/common/formgeneration/sessionbean/FormsBean.generateSinglePDF( at gov/v3/common/formgeneration/sessionbean/FormsBean.generatePDF( at gov/v3/common/formgeneration/sessionbean/FormsBean.endorseDocument( at gov/v3/common/formgeneration/sessionbean/Forms_qaco28_EOImpl.endorseDocument( at gov/v3/delegates/common/FormsAndNoticesDelegate.endorseDocument( at gov/v3/actions/common/EndorseDocumentAction.executeRequest( at gov/v3/fwk/controller/struts/action/V3CommonDispatchAction.dispatchToExecuteMethod( at gov/v3/fwk/controller/struts/action/V3CommonDispatchAction.executeBaseAction( at gov/v3/fwk/controller/struts/action/V3BaseDispatchAction.execute( at org/apache/struts/action/RequestProcessor.processActionPerform( at gov/v3/fwk/controller/struts/requestprocessor/V3TilesRequestProcessor.processActionPerform( at org/apache/struts/action/RequestProcessor.process( at org/apache/struts/action/ActionServlet.process( at org/apache/struts/action/ActionServlet.doGet( at gov/v3/fwk/controller/struts/servlet/V3ControllerServlet.doGet( at javax/servlet/http/HttpServlet.service( at javax/servlet/http/HttpServlet.service( at weblogic/servlet/internal/StubSecurityHelper$ at weblogic/servlet/internal/StubSecurityHelper.invokeServlet( at weblogic/servlet/internal/ServletStubImpl.execute( at weblogic/servlet/internal/ServletStubImpl.execute( at weblogic/servlet/internal/WebAppServletContext$ at weblogic/security/acl/internal/AuthenticatedSubject.doAs( at weblogic/security/service/SecurityManager.runAs( at weblogic/servlet/internal/WebAppServletContext.securedExecute( at weblogic/servlet/internal/WebAppServletContext.execute( at weblogic/servlet/internal/ at weblogic/work/ExecuteThread.execute( at weblogic/work/ at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method) -- end of trace

We would like to do a couple of things:

1.) Prevent the spawning of a duplicate JVM, as we do not need any of it's functions when executing the simple jfmerge executable, and it creates massive overhead.

2.) In the short term at least prevent this duplicate JVM from handing indefinitely.


This answer is late, but we have the same problem, and the problem for us is how Solaris manage the memory.

The problem is when we have a application server, using a lot of memory 10GB in my case, and we want to run a simple "ls", the new process needs 10GB to run.

Solaris needs the 10GB extra available in our server, Linux use a feature known as “copy-on-write” This feature reduces the overhead of forking a new process

Historical Background and Problem Description

Traditionally, Unix has had only one way to create a new process: using a fork() system call, often followed by an exec() system call. The fork() call makes a copy of the entire parent process' address space, and exec() turns that copy into a new process.

(Note: In the Solaris OS, the term swap space is used to describe a combination of physical memory and disk swap space configured for the system. However, with other Unix systems this term may mean swap space on disk, also known as backing store. To avoid any confusion, I'll use the term Virtual Memory (VM) to mean physical memory plus disk swap space.)

Generally, the fork/exec method has worked quite well. However, it has disadvantages in some cases, such as running out of memory without a good reason and poor fork performance.

Out of Memory: For a large-memory process, the fork() system call can fail due to an inadequate amount of VM, because fork() requires twice the amount of the parent memory. This can happen even when fork() is immediately followed by an exec() call that would release most of that extra memory. When this happens, the application will usually terminate.

For example, suppose a 64-bit application is consuming 6 gigabytes (Gbytes) of VM at the moment, and it needs to create a subprocess to run the ls(1) command. The parent process issues a fork() call that will succeed only if there is another 6 Gbytes of VM available at the moment. If the system doesn't have that much VM available (which is a frequent situation), fork() will fail with ENOMEM. Obviously, the ls(1) command doesn't need anywhere near 6 Gbytes of memory to run, but fork() doesn't know that.

Not only applications, but also Sun's own tools can suffer from the same problem. For example, the following Sun RFE (request for enhancement) has been filed for dbx: "4748951 dbx shell should use posix_spawn() for non-builtin commands rather than fork(2)".

RFE 4748951 came about when a customer's utility invoked dbx to read a huge core file using a script that also needed to run a cut(1) command from within dbx. They got a cannot fork - try again error message causing dbx to abort. An investigation revealed that dbx used fork/exec to execute that tiny cut(1) command and ran out of VM during the fork() call.

The Solaris Java Virtual Machine (JVM) is also suffering from the same problem currently, as described in this Sun RFE: "5049299 Use posix_spawn, not fork, on S10 to avoid swap exhaustion".

So you have 3 options.

1.- Execute the Runtime.exec function earlier.

2.- Create a inter process comunication with other java server, and ececute there the Runtime.exec instruccion.

3.- Create a JNI class to call a system C function. I take this option, and it work perfect.

I put my sample code here.

Java Code.

public class CallOS {
    static {

    public native int exec(java.lang.String cmd);

    public static void main(String[] args) {
            int returnValue = 0;
            returnValue = new CallOS().exec("ls -la");
            System.out.println("- " + returnValue);

C header Code. This is generate with javah -jni CallOS

/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class CallOS */

#ifndef _Included_CallOS
#define _Included_CallOS
#ifdef __cplusplus
extern "C" {
 * Class:     CallOS
 * Method:    exec
 * Signature: (Ljava/lang/String;)I
  (JNIEnv *, jobject, jstring);

#ifdef __cplusplus

C code.

#include "CallOS.h"
#include <stdlib.h>

  (JNIEnv *env, jobject obj, jstring cmd)
   jint  retval;
   jbyte *str;

   str = (*env)->GetStringUTFChars(env, cmd, NULL);
   if(str == NULL) return NULL;

   retval = system(str);

   (*env)->ReleaseStringUTFChars(env, cmd, str);
   return retval;

I hope this help for you.