I have a Java application running on Sun 1.6 32-bit VM/Solaris 10 (x86)/Nahelem 8-core(2 threads per core).
A specific usecase in the application is to respond to some external message. In my performance test environment, when I prepare and send the response in the same thread that receives the external input, I get about 50 us advantage than when I hand off the message to a separate thread to send the response. I use a ThreadPoolExecutor
with a SynchronousQueue
to do the handoff.
In your experience what is the acceptable expected delay between scheduling a task to a thread pool and it getting picked up for execution? What ideas had worked for you in the past to try improve this?
The "acceptable delay" entirely depends on your application. Dealing with everything on the same thread can indeed help if you've got very strict latency requirements. Fortunately most applications don't have requirements quite that strict.
Of course, if only one thread is able to receive requests, then tying up that thread for computing the response will mean you can't accept any other requests. Depending on what you're doing you can use asynchronous IO (etc) to avoid the "thread per request" model, but it's significantly harder IMO, and still ends up with thread context switching.
Sometimes it's appropriate to queue requests to avoid having too many threads processing them: if your handling is CPU-bound, it doesn't make much sense to have hundreds of threads - better to have a producer/consumer queue of tasks and distribute them at roughly one thread per core. That's basically what ThreadPoolExecutor
will do if you set it up properly of course. That doesn't work as well if your requests spend a lot of their time waiting for external services (including disks, but primarily other network services)... at that point you either need to use asynchronous execution models whenever you would potentially make a core idle with a blocking call, or you take the thread context switching hit and have lots of threads, relying on the thread scheduler to make it work well enough.
The bottom line is that latency requirements can be tough - in my experience they're significantly tougher than throughput requirements, as they're much harder to scale out. It really does depend on the context though.