Slow transfers in Jetty with chunked transfer encoding at certain buffer size

Sven picture Sven · Jan 27, 2012 · Viewed 14.4k times · Source

I'm investigating a performance problem with Jetty 6.1.26. Jetty appears to use Transfer-Encoding: chunked, and depending on the buffer size used, this can be very slow when transferring locally.

I've created a small Jetty test application with a single servlet that demonstrates the issue.

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.OutputStream;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.mortbay.jetty.Server;
import org.mortbay.jetty.nio.SelectChannelConnector;
import org.mortbay.jetty.servlet.Context;

public class TestServlet extends HttpServlet {

    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse resp)
            throws ServletException, IOException {
        final int bufferSize = 65536;
        resp.setBufferSize(bufferSize);
        OutputStream outStream = resp.getOutputStream();

        FileInputStream stream = null;
        try {
            stream = new FileInputStream(new File("test.data"));
            int bytesRead;
            byte[] buffer = new byte[bufferSize];
            while( (bytesRead = stream.read(buffer, 0, bufferSize)) > 0 ) {
                outStream.write(buffer, 0, bytesRead);
                outStream.flush();
            }
        } finally   {
            if( stream != null )
                stream.close();
            outStream.close();
        }
    }

    public static void main(String[] args) throws Exception {
        Server server = new Server();
        SelectChannelConnector ret = new SelectChannelConnector();
        ret.setLowResourceMaxIdleTime(10000);
        ret.setAcceptQueueSize(128);
        ret.setResolveNames(false);
        ret.setUseDirectBuffers(false);
        ret.setHost("0.0.0.0");
        ret.setPort(8080);
        server.addConnector(ret);
        Context context = new Context();
        context.setDisplayName("WebAppsContext");
        context.setContextPath("/");
        server.addHandler(context);
        context.addServlet(TestServlet.class, "/test");
        server.start();
    }

}

In my experiment, I'm using a 128MB test file that the servlet returns to the client, which connects using localhost. Downloading this data using a simple test client written in Java (using URLConnection) takes 3.8 seconds, which is very slow (yes, it's 33MB/s, which doesn't sound slow, except that this is purely local and the input file was cached; it should be much faster).

Now here's where it gets strange. If I download the data with wget, which is a HTTP/1.0 client and therefore doesn't support chunked transfer encoding, it only takes 0.1 seconds. That's a much better figure.

Now when I change bufferSize to 4096, the Java client takes 0.3 seconds.

If I remove the call to resp.setBufferSize entirely (which appears to use a 24KB chunk size), the Java client now takes 7.1 seconds, and wget is suddenly equally slow!

Please note I'm not in any way an expert with Jetty. I stumbled across this problem while diagnosing a performance problem in Hadoop 0.20.203.0 with reduce task shuffling, which transfers files using Jetty in a manner much like the reduced sample code, with a 64KB buffer size.

The problem reproduces both on our Linux (Debian) servers and on my Windows machine, and with both Java 1.6 and 1.7, so it appears to depend solely on Jetty.

Does anyone have any idea what could be causing this, and if there's something I can do about it?

Answer

Sven picture Sven · Jan 31, 2012

I believe I have found the answer myself, by looking through the Jetty source code. It's actually a complex interplay of the response buffer size, the size of the buffer passed to outStream.write, and whether or not outStream.flush is called (in some situations). The issue is with the way Jetty uses its internal response buffer, and how the data you write to the output is copied to that buffer, and when and how that buffer is flushed.

If the size of the buffer used with outStream.write is equal to the response buffer (I think a multiple also works), or less and outStream.flush is used, then performance is fine. Each write call is then flushed straight to the output, which is fine. However, when the write buffer is larger and not a multiple of the response buffer, this seems to cause some weirdness in how the flushes are handled, causing extra flushes, leading to bad performance.

In the case of chunked transfer encoding, there's an extra kink in the cable. For all but the first chunk, Jetty reserves 12 bytes of the response buffer to contain the chunk size. This means that in my original example with a 64KB write and response buffer, the actual amount of data that fit in the response buffer was only 65524 bytes, so again, parts of the write buffer were spilling into multiple flushes. Looking at a captured network trace of this scenario, I see that the first chunk is 64KB, but all subsequent chunks are 65524 bytes. In this case, outStream.flush makes no difference.

When using a 4KB buffer I was seeing fast speeds only when outStream.flush was called. It turns out that resp.setBufferSize will only increase the buffer size, and since the default size is 24KB, resp.setBufferSize(4096) is a no-op. However, I was now writing 4KB pieces of data, which fit in the 24KB buffer even with the reserved 12 bytes, and are then flushed as a 4KB chunk by the outStream.flush call. However, when the call to flush is removed, it will let the buffer fill up, again with 12 bytes spilling into the next chunk because 24 is a multiple of 4.

In conclusion

It seems that to get good performance with Jetty, you must either:

  • When calling setContentLength (no chunked transfer encoding) and use a buffer for write that's the same size as the response buffer size.
  • When using chunked transfer encoding, use a write buffer that's at least 12 bytes smaller than the response buffer size, and call flush after each write.

Note that the performance of the "slow" scenario is still such that you'll likely only see the difference on the local host or very fast (1Gbps or more) network connection.

I guess I should file issue reports against Hadoop and/or Jetty for this.