What is the difference between Java 6 and 7 that would cause a performance issue?

medv4380 picture medv4380 · Feb 28, 2013 · Viewed 11.7k times · Source

My general experience with Java 7 tells me that it is faster than Java 6. However, I've run into enough information that makes me believe that this is not always the case.

The first bit of information comes from Minecraft Snooper data found here. My intention was to look at that data to determine the effects of the different switches used to launch Minecraft. For example I wanted to know if using -Xmx4096m had a negative or positive effect on performance. Before I could get there I looked at the different version of Java being used. It covers everything from 1.5 to a developer using 1.8. In general as you increase the java version you see an increase in fps performance. Throughout the different versions of 1.6 you even see this gradual trend up. I honestly wasn't expecting to see as many different versions of java still in the wild but I guess people don't run the updates like they should.

Some time around the later versions of 1.6 you get the highest peeks. 1.7 performs about 10fps on average below the later versions of 1.6 but still higher than the early versions of 1.6. On a sample from my own system it's almost impossible to see the difference but when looking at the broader sample it's clear.

To control for the possibility that someone might have found a magic switch for Java I control with by only looking at the data with No switches being passed. That way I'd have a reasonable control before I started looking at the different flags.

I dismissed most of what I was seeing as this could be some Magic Java 6 that someone's just not sharing with me.

Now I've been working on another project that requires me to pass an array in an InputStream to be processed by another API. Initially I used a ByteArrayInputStream because it would work out of the box. When I looked at the code for it I noticed that every function was synchronized. Since this was unnecessary for this project I rewrote one with the synchronization stripped out. I then decided that I wanted to know what the general cost of Synchronization was for me in this situation.

I mocked up a simple test just to see. I timed everything in with System.nanoTime() and used Java 1.6_20 x86 and 1.7.0-b147 AMD64, and 1.7_15 AMD64 and using the -server. I expected the AMD64 version to outperform based on architecture alone and have any java 7 advantages. I also looked at the 25th, 50th, and 75th percentile (blue,red,green). However 1.6 with no -server beat the pants off of every other configuration. graph

So my question is. What is in the 1.6 -server option that is impacting performance that is also defaulted to on in 1.7?

I know most of the speed enhancement in 1.7 came from defaulting some of the more radical performance options in 1.6 to on, but one of them is causing a performance difference. I just don't know which ones to look at.

public class ByteInputStream extends InputStream {

public static void main(String args[]) throws IOException {
    String song = "This is the song that never ends";
    byte[] data = song.getBytes();
    byte[] read = new byte[data.length];
    ByteArrayInputStream bais = new ByteArrayInputStream(data);
    ByteInputStream bis = new ByteInputStream(data);

    long startTime, endTime;

    for (int i = 0; i < 10; i++) {
        /*code for ByteInputStream*/
        /*
        startTime = System.nanoTime();
        for (int ctr = 0; ctr < 1000; ctr++) {
            bis.mark(0);
            bis.read(read);
            bis.reset();
        }
        endTime = System.nanoTime(); 

        System.out.println(endTime - startTime); 
        */

        /*code for ByteArrayInputStream*/
        startTime = System.nanoTime();
        for (int ctr = 0; ctr < 1000; ctr++) {
            bais.mark(0);
            bais.read(read);
            bais.reset();
        }
        endTime = System.nanoTime();

        System.out.println(endTime - startTime);
    }

}

private final byte[] array;
private int pos;
private int min;
private int max;
private int mark;

public ByteInputStream(byte[] array) {
    this(array, 0, array.length);
}

public ByteInputStream(byte[] array, int offset, int length) {
    min = offset;
    max = offset + length;
    this.array = array;
    pos = offset;
}

@Override
public int available() {
    return max - pos;
}

@Override
public boolean markSupported() {
    return true;
}

@Override
public void mark(int limit) {
    mark = pos;
}

@Override
public void reset() {
    pos = mark;
}

@Override
public long skip(long n) {
    pos += n;
    if (pos > max) {
        pos = max;
    }
    return pos;
}

@Override
public int read() throws IOException {
    if (pos >= max) {
        return -1;
    }
    return array[pos++] & 0xFF;
}

@Override
public int read(byte b[], int off, int len) {
    if (pos >= max) {
        return -1;
    }
    if (pos + len > max) {
        len = max - pos;
    }
    if (len <= 0) {
        return 0;
    }
    System.arraycopy(array, pos, b, off, len);
    pos += len;
    return len;
}

@Override
public void close() throws IOException {
}

}// end class

Answer

Andrew Alcock picture Andrew Alcock · Mar 1, 2013

I think, as the others are saying, that your tests are too short to see the core issues - the graph is showing nanoTime, and that implies the core section being measured completes in 0.0001 to 0.0006s.

Discussion

The key difference in -server and -client is that -server expects the JVM to be around for a long time and therefore expends effort early on for better long-term results. -client aims for fast startup times and good-enough performance.

In particular hotspot runs with more optimizations, and these take more CPU to execute. In other words, with -server, you may be seeing the cost of the optimizer outweighing any gains from the optimization.

See Real differences between "java -server" and "java -client"?

Alternatively, you may also be seeing the effects of tiered compilation where, in Java 7, hotspot doesn't kick in so fast. With only 1000 iterations, the full optimization of your code won't be done until later, and the benefits will therefore be lesser.

You might get insight if you run java with the -Xprof option the JVM will dump some data about the time spent in various methods, both interpreted and compiled. It should give an idea about what was compiled, and the ratio of (cpu) time before hotspot kicked in.

However, to get a true picture, you really need to run this much longer - secondsminutes, not milliseconds - to allow Java and the OS to warm up. It would be even better to loop the test in main (so you have a loop containing your instrumented main test loop) so that you can ignore the warm-up.

EDIT Changed seconds to minutes to ensure that hotspot, the jvm and the OS are properly 'warmed up'