Let's imagine an hypothetical HFT system in Java, requiring (very) low-latency, with lots of short-lived small objects somewhat due to immutability (Scala?), thousands of connections per second, and an obscene number of messages passing around in an event-driven architecture (akka and amqp?).
For the experts out there, what would (hypothetically) be the best tuning for JVM 7? What type of code would make it happy? Would Scala and Akka be ready for this kind of systems?
Note: There has been some similar questions, like this one, but I've yet to find one covering Scala (which has its own idiosyncratic footprint in the JVM).
It is possible to achieve very good performance in Java. However the question needs to be more specific to provide a credible answer. Your main sources of latency will come from follow non-exhaustive list:
How much garbage you create and the work of the GC to collect and promote it. Immutable designs in my experience do not fit well with low-latency. GC tuning needs to be a big focus.
Warm up the JVM so that classes are loaded and the JIT has had time to do its work.
Design your algorithms to be O(1) or at least O(log2 n), and have performance tests that assert this.
Your design needs to be lock-free and follow the "Single Writer Principle".
A significant effort needs to be put into understanding the whole stack and showing mechanical sympathy in its use.
Design your algorithms and data structures to be cache friendly. Cache misses these days are the biggest cost. This is closely related to process affinity which if not set up correctly can result and significant cache pollution. This will involve sympathy for the OS and even some JNI code in some cases.
Ensure you have sufficient cores so that any thread that needs to run has a core available without having to wait.
I recently blogged about a case study of such an exercise.