We just finished building a server to store data to disk and fronted it with Netty. During load testing we were seeing Netty scaling to about 8,000 messages per second. Given our systems, this looked really low. For a benchmark, we wrote a Tomcat front-end and run the same load tests. With these tests we were getting roughly 25,000 messages per second.
Here are the specs for our load testing machine:
Here is the load test setup for Netty:
Here is the load test setup for Tomcat:
My main question is why such a huge different in performance? Is there something obvious with respect to Netty that can get it to run faster than Tomcat?
Edit: Here is the main Netty server code:
NioServerSocketChannelFactory factory = new NioServerSocketChannelFactory();
ServerBootstrap server = new ServerBootstrap(factory);
server.setPipelineFactory(new ChannelPipelineFactory() {
public ChannelPipeline getPipeline() {
RequestDecoder decoder = injector.getInstance(RequestDecoder.class);
ContentStoreChannelHandler handler = injector.getInstance(ContentStoreChannelHandler.class);
return Channels.pipeline(decoder, handler);
}
});
server.setOption("child.tcpNoDelay", true);
server.setOption("child.keepAlive", true);
Channel channel = server.bind(new InetSocketAddress(port));
allChannels.add(channel);
Our handlers look like this:
public class RequestDecoder extends FrameDecoder {
@Override
protected ChannelBuffer decode(ChannelHandlerContext ctx, Channel channel, ChannelBuffer buffer) {
if (buffer.readableBytes() < 4) {
return null;
}
buffer.markReaderIndex();
int length = buffer.readInt();
if (buffer.readableBytes() < length) {
buffer.resetReaderIndex();
return null;
}
return buffer;
}
}
public class ContentStoreChannelHandler extends SimpleChannelHandler {
private final RequestHandler handler;
@Inject
public ContentStoreChannelHandler(RequestHandler handler) {
this.handler = handler;
}
@Override
public void messageReceived(ChannelHandlerContext ctx, MessageEvent e) {
ChannelBuffer in = (ChannelBuffer) e.getMessage();
in.readerIndex(4);
ChannelBuffer out = ChannelBuffers.dynamicBuffer(512);
out.writerIndex(8); // Skip the length and status code
boolean success = handler.handle(new ChannelBufferInputStream(in), new ChannelBufferOutputStream(out), new NettyErrorStream(out));
if (success) {
out.setInt(0, out.writerIndex() - 8); // length
out.setInt(4, 0); // Status
}
Channels.write(e.getChannel(), out, e.getRemoteAddress());
}
@Override
public void exceptionCaught(ChannelHandlerContext ctx, ExceptionEvent e) {
Throwable throwable = e.getCause();
ChannelBuffer out = ChannelBuffers.dynamicBuffer(8);
out.writeInt(0); // Length
out.writeInt(Errors.generalException.getCode()); // status
Channels.write(ctx, e.getFuture(), out);
}
@Override
public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent e) {
NettyContentStoreServer.allChannels.add(e.getChannel());
}
}
UPDATE:
I've managed to get my Netty solution to within 4,000/second. A few weeks back I was testing a client side PING in my connection pool as a safe guard against idle sockets but I forgot to remove that code before I started load testing. This code effectively PINGed the server every time a Socket was checked out from the pool (using Commons Pool). I commented that code out and I'm now getting 21,000/second with Netty and 25,000/second with Tomcat.
Although, this is great news on the Netty side, I'm still getting 4,000/second less with Netty than Tomcat. I can post my client side (which I thought I had ruled out but apparently not) if anyone is interested in seeing that.
The method messageReceived
is executed using a worker thread that is possibly getting blocked by RequestHandler#handle
which may be busy doing some I/O work.
You could try adding into the channel pipeline an OrderdMemoryAwareThreadPoolExecutor
(recommended) for executing the handlers or alternatively, try dispatching your handler work to a new ThreadPoolExecutor and passing a reference to the socket channel for later writing the response back to client. Ex.:
@Override
public void messageReceived(ChannelHandlerContext ctx, MessageEvent e) {
executor.submit(new Runnable() {
processHandlerAndRespond(e);
});
}
private void processHandlerAndRespond(MessageEvent e) {
ChannelBuffer in = (ChannelBuffer) e.getMessage();
in.readerIndex(4);
ChannelBuffer out = ChannelBuffers.dynamicBuffer(512);
out.writerIndex(8); // Skip the length and status code
boolean success = handler.handle(new ChannelBufferInputStream(in), new ChannelBufferOutputStream(out), new NettyErrorStream(out));
if (success) {
out.setInt(0, out.writerIndex() - 8); // length
out.setInt(4, 0); // Status
}
Channels.write(e.getChannel(), out, e.getRemoteAddress());
}