Netty slower than Tomcat

voidmain picture voidmain · Oct 9, 2012 · Viewed 14.8k times · Source

We just finished building a server to store data to disk and fronted it with Netty. During load testing we were seeing Netty scaling to about 8,000 messages per second. Given our systems, this looked really low. For a benchmark, we wrote a Tomcat front-end and run the same load tests. With these tests we were getting roughly 25,000 messages per second.

Here are the specs for our load testing machine:

  • Macbook Pro Quad core
  • 16GB of RAM
  • Java 1.6

Here is the load test setup for Netty:

  • 10 threads
  • 100,000 messages per thread
  • Netty server code (pretty standard) - our Netty pipeline on the server is two handlers: a FrameDecoder and a SimpleChannelHandler that handles the request and response.
  • Client side JIO using Commons Pool to pool and reuse connections (the pool was sized the same as the # of threads)

Here is the load test setup for Tomcat:

  • 10 threads
  • 100,000 messages per thread
  • Tomcat 7.0.16 with default configuration using a Servlet to call the server code
  • Client side using URLConnection without any pooling

My main question is why such a huge different in performance? Is there something obvious with respect to Netty that can get it to run faster than Tomcat?

Edit: Here is the main Netty server code:

NioServerSocketChannelFactory factory = new NioServerSocketChannelFactory();
ServerBootstrap server = new ServerBootstrap(factory);
server.setPipelineFactory(new ChannelPipelineFactory() {
  public ChannelPipeline getPipeline() {
    RequestDecoder decoder = injector.getInstance(RequestDecoder.class);
    ContentStoreChannelHandler handler = injector.getInstance(ContentStoreChannelHandler.class);
    return Channels.pipeline(decoder, handler);
  }
});

server.setOption("child.tcpNoDelay", true);
server.setOption("child.keepAlive", true);
Channel channel = server.bind(new InetSocketAddress(port));
allChannels.add(channel);

Our handlers look like this:

public class RequestDecoder extends FrameDecoder {
  @Override
  protected ChannelBuffer decode(ChannelHandlerContext ctx, Channel channel, ChannelBuffer buffer) {
    if (buffer.readableBytes() < 4) {
      return null;
    }

    buffer.markReaderIndex();
    int length = buffer.readInt();
    if (buffer.readableBytes() < length) {
      buffer.resetReaderIndex();
      return null;
    }

    return buffer;
  }
}

public class ContentStoreChannelHandler extends SimpleChannelHandler {
  private final RequestHandler handler;

  @Inject
  public ContentStoreChannelHandler(RequestHandler handler) {
    this.handler = handler;
  }

  @Override
  public void messageReceived(ChannelHandlerContext ctx, MessageEvent e) {
    ChannelBuffer in = (ChannelBuffer) e.getMessage();
    in.readerIndex(4);

    ChannelBuffer out = ChannelBuffers.dynamicBuffer(512);
    out.writerIndex(8); // Skip the length and status code

    boolean success = handler.handle(new ChannelBufferInputStream(in), new ChannelBufferOutputStream(out), new NettyErrorStream(out));
    if (success) {
      out.setInt(0, out.writerIndex() - 8); // length
      out.setInt(4, 0); // Status
    }

    Channels.write(e.getChannel(), out, e.getRemoteAddress());
  }

  @Override
  public void exceptionCaught(ChannelHandlerContext ctx, ExceptionEvent e) {
    Throwable throwable = e.getCause();
    ChannelBuffer out = ChannelBuffers.dynamicBuffer(8);
    out.writeInt(0); // Length
    out.writeInt(Errors.generalException.getCode()); // status

    Channels.write(ctx, e.getFuture(), out);
  }

  @Override
  public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent e) {
    NettyContentStoreServer.allChannels.add(e.getChannel());
  }
}

UPDATE:

I've managed to get my Netty solution to within 4,000/second. A few weeks back I was testing a client side PING in my connection pool as a safe guard against idle sockets but I forgot to remove that code before I started load testing. This code effectively PINGed the server every time a Socket was checked out from the pool (using Commons Pool). I commented that code out and I'm now getting 21,000/second with Netty and 25,000/second with Tomcat.

Although, this is great news on the Netty side, I'm still getting 4,000/second less with Netty than Tomcat. I can post my client side (which I thought I had ruled out but apparently not) if anyone is interested in seeing that.

Answer

Ben-Hur Langoni Junior picture Ben-Hur Langoni Junior · Sep 20, 2016

The method messageReceived is executed using a worker thread that is possibly getting blocked by RequestHandler#handle which may be busy doing some I/O work. You could try adding into the channel pipeline an OrderdMemoryAwareThreadPoolExecutor (recommended) for executing the handlers or alternatively, try dispatching your handler work to a new ThreadPoolExecutor and passing a reference to the socket channel for later writing the response back to client. Ex.:

@Override
public void messageReceived(ChannelHandlerContext ctx, MessageEvent e) {   

    executor.submit(new Runnable() {
        processHandlerAndRespond(e);        
    });
}

private void processHandlerAndRespond(MessageEvent e) {

    ChannelBuffer in = (ChannelBuffer) e.getMessage();
    in.readerIndex(4);
    ChannelBuffer out = ChannelBuffers.dynamicBuffer(512);
    out.writerIndex(8); // Skip the length and status code
    boolean success = handler.handle(new ChannelBufferInputStream(in), new ChannelBufferOutputStream(out), new NettyErrorStream(out));
    if (success) {
        out.setInt(0, out.writerIndex() - 8); // length
        out.setInt(4, 0); // Status
    }
    Channels.write(e.getChannel(), out, e.getRemoteAddress());
}