C++ Socket Server - Unable to saturate CPU

Alex Black picture Alex Black · Aug 5, 2009 · Viewed 13.3k times · Source

I've developed a mini HTTP server in C++, using boost::asio, and now I'm load testing it with multiple clients and I've been unable to get close to saturating the CPU. I'm testing on a Amazon EC2 instance, and getting about 50% usage of one cpu, 20% of another, and the remaining two are idle (according to htop).

Details:

  • The server fires up one thread per core
  • Requests are received, parsed, processed, and responses are written out
  • The requests are for data, which is read out of memory (read-only for this test)
  • I'm 'loading' the server using two machines, each running a java application, running 25 threads, sending requests
  • I'm seeing about 230 requests/sec throughput (this is application requests, which are composed of many HTTP requests)

So, what should I look at to improve this result? Given the CPU is mostly idle, I'd like to leverage that additional capacity to get a higher throughput, say 800 requests/sec or whatever.

Ideas I've had:

  • The requests are very small, and often fulfilled in a few ms, I could modify the client to send/compose bigger requests (perhaps using batching)
  • I could modify the HTTP server to use the Select design pattern, is this appropriate here?
  • I could do some profiling to try to understand what the bottleneck's are/is

Answer

cmeerw picture cmeerw · Aug 6, 2009

boost::asio is not as thread-friendly as you would hope - there is a big lock around the epoll code in boost/asio/detail/epoll_reactor.hpp which means that only one thread can call into the kernel's epoll syscall at a time. And for very small requests this makes all the difference (meaning you will only see roughly single-threaded performance).

Note that this is a limitation of how boost::asio uses the Linux kernel facilities, not necessarily the Linux kernel itself. The epoll syscall does support multiple threads when using edge-triggered events, but getting it right (without excessive locking) can be quite tricky.

BTW, I have been doing some work in this area (combining a fully-multithreaded edge-triggered epoll event loop with user-scheduled threads/fibers) and made some code available under the nginetd project.