I am wondering about the parameters for constructing a ConcurrentHashMap
:
initialCapacity
is 16 by default (understood).loadFactor
is 0.75 by default.concurrencyLevel
is 16 by default.My questions are:
loadFactor
up or down?concurrencyLevel
up or down?Additionally:
Thank you!
The short answer: set "initial capacity" to roughly how many mappings you expect to put in the map, and leave the other parameters at their default.
Long answer:
load factor is the ratio between the number of "buckets" in the map and the number of expected elements;
0.75 is usually a reasonable compromise-- as I recall, it means that with a good hash function, on average we expect about 1.6 redirects to find an element in the map (or around that figure);
changing the load factor changes the compromise between more redirects to find an element but less wasted space-- put 0.75 is really usually a good value;
in principle, set ConcurrencyLevel to the number of concurrent threads you expect to have modifying the map, although overestimating this doesn't appear to have a bad effect other than wasting memory (I wrote a little on ConcurrentHashMap performance a while ago in case you're interested)
Informally, your hash function should essentially aim to have as much "randomness" in the bits as possible. Or more strictly, the hash code for a given element should give each bit a roughly 50% chance of being set. It's actually easier to illustrate this with an example: again, you may be interested in some stuff I wrote about how the String hash function works and associated hash function guidelines. Feedback is obvioulsy welcome on any of this stuff.
One thing I also mention at some point is that you don't have to be too paranoid in practice: if your hash function produces a "reasonable" amount of randomness in some of the bits, then it will often be OK. In the worst case, sticking representative pieces of data into a string and taking the hash code of the string actually doesn't work so badly.