I'm building an extremely high-performance piece of enterprise software, which will receive, handle, and respond to over 50,000 TCP requests per second. This will be spread over a number of Amazon EC2 servers, but I'd like to get a single server able to handle as many thousand requests per second as possible (shooting for 5k/sec). I'm most likely going to be using the m1.xlarge instance running Amazon Linux.
I'm building this software in C++ with Boost ASIO, and I'm trying to figure out the most efficient way of architecting the socket handling. In the examples (http://www.boost.org/doc/libs/1_53_0/doc/html/boost_asio/examples.html) I'm leaning toward emulating the "HTTP Server 2" since we'll have multiple vCPUs to employee.
Could someone really describe the pros/cons of each HTTP server example there, and dealing with this many connections, I'd really appreciate any additional insight (regarding Boost sockets, and/or high-throughput EC2 configuration).
Thanks so much!
Some suggestions:
You didn't mention what your server is going to be doing. Is it going to be accepting and closing 50K new requests per second, or just servicing messages (requests) from established TCP connections. So my advice might have to be a little generic.
Read the C10K problem: http://www.kegel.com/c10k.html
Invest in using epoll as the socket notification solution instead of ASIO. epoll is not hard.
Consider using a fixed number of threads (2-8). Either load balance the socket connections across these threads, or just use a work pool of threads to service request messages parsed off the socket thread. Design for multiple threads, but start with just using 1 thread. Then resolve all performance issues. Once you get the single threaded solution working well, and performance is at its peak, then consider increasing the thread count such that multiple operations can be processed while other threads are blocked.
Chances are very high that your server's performance issues will be outside of the socket design. Continuously benchmark and run tools such as valgrind to understand where the code is spending most of its time. Chances are high, it's where you least expect it. For example, on my server, I found that the majority of the time was spent allocating and freeing memory for little temp buffers. I would never have guessed that. Then I changed the server design to allocate memory up front, use stack memory, etc... such that handling a request never required the code to allocate memory. Performance easily doubled when I made that change.