How do Jetty and other containers leverage NIO while sticking to the Servlet specification?

rationalrevolt picture rationalrevolt · Aug 8, 2014 · Viewed 11.6k times · Source

I'm new to NIO, and I am trying to figure out how Jetty leverages NIO.

My understanding of how traditional servlet containers that use Blocking IO service a request is as follows:

  1. A request arrives
  2. A thread is allocated to process the request and the servlet method (doGet etc) is invoked
  3. Servlet method is handed an InputStream and OutputStream
  4. The servlet method reads from the InputStream and writes to the OutputStream
  5. The InputStream and OutputStream are basically tied to the respective streams of the underlying Socket

What is different when an NIO connector is used? My guess is along the following lines:

  1. A request arrives
  2. Jetty uses NIO connector and buffers the entire request asynchronously
  3. Once request has been read completely wrap the buffer in an InputStream
  4. Create an empty response buffer (wrapped in an OutputStream)
  5. Allocate a thread and invoke the servlet method (doGet etc) handing the above wrapper streams
  6. Servlet method writes to the wrapped (buffered) response stream and returns from the servlet method
  7. Jetty uses NIO to write the response buffer contents to the underlying SocketChannel

From the Jetty documentation, I found the following:

SelectChannelConnector - This connector uses efficient NIO buffers with a non-blocking threading model. Jetty uses Direct NIO buffers, and allocates threads only to connections with requests. Synchronization simulates blocking for the servlet API, and any unflushed content at the end of request handling is written asynchronously.

I'm not sure I understand what Synchronization simulates blocking for the servlet API means?

Answer

gregw picture gregw · Aug 8, 2014

You don't have it exactly correct. When jetty uses an NIO connector (and 9 only supports NIO) it works as follows:

  1. Idle state as a few threads (1-4 depending on # cores) calling the selector, looking for IO activity. This has been scaled to over 1,000,000 connections on Jetty.
  2. When selector sees IO activity, it calls a handle method on the connection, which either:

    • something else has registered that it is blocked waiting for IO for this connection, so in that case the selector just wakes up whoever was blocked.
    • otherwise a thread is dispatched to handle the connection.
  3. If a thread is dispatched, then it will attempt to read the connection and parse it. What happens now depends on if the connection is http, spdy, http2 or websocket.

    • for http, if the request headers are complete, the thread goes on to call the handling of the request (eventually this gets to the servlet) without waiting for any content.
    • for http2/spdy another dispatch is required, but see the discussion about Eat-What-You-Kill strategy on the list: http://dev.eclipse.org/mhonarc/lists/jetty-dev/msg02166.html
    • for websocket the message handling is called.
  4. Once a thread is dispatched to a servlet, it looks to it like the servlet IO is blocking, but underneath the level of HttpInputStream and HttpOutputStream all the IO is async with callbacks. The blocking API uses a special blocking callback to achieve blocking. This means that if the servlet chooses to use async IO, then it is just bypassing the blocking callback and using the async API more or less directly.

  5. A servlet can suspend using request.startAsync, in which case the dispatched thread is returned to the thread pool, but the associated connection is not marked as interested in IO. Async IO can be performed, but a AsyncContext event is need to either reallocate a thread or to re-enroll the connection for IO activity once the async cycle is complete.

This view is slightly complicated by http2 and spdy, which are multiplexed, so they can involve an extra dispatch.

Any HTTP framework that does not dispatch can go really really fast in benchmark code, but when faced with a real application that can do silly things like block on databases, files system, REST services etc... then lack of dispatch just means that one connection can hold up all the other connections on the system.

For some more info on how jetty handles async and dispatch see: