ConcurrentHashMap constructor parameters?

non sequitor picture non sequitor · Oct 15, 2009 · Viewed 10.6k times · Source

I am wondering about the parameters for constructing a ConcurrentHashMap:

  • initialCapacity is 16 by default (understood).
  • loadFactor is 0.75 by default.
  • concurrencyLevel is 16 by default.

My questions are:

  • What criteria should be used to adjust loadFactor up or down?
  • How do we establish the number of concurrently updating threads?
  • What criteria should be used to adjust concurrencyLevel up or down?

Additionally:

  • What are the hallmarks of a good hashcode implementation? (If an SO question addresses this, just link to it.)

Thank you!

Answer

Neil Coffey picture Neil Coffey · Oct 15, 2009

The short answer: set "initial capacity" to roughly how many mappings you expect to put in the map, and leave the other parameters at their default.

Long answer:

  • load factor is the ratio between the number of "buckets" in the map and the number of expected elements;

  • 0.75 is usually a reasonable compromise-- as I recall, it means that with a good hash function, on average we expect about 1.6 redirects to find an element in the map (or around that figure);

    • changing the load factor changes the compromise between more redirects to find an element but less wasted space-- put 0.75 is really usually a good value;

    • in principle, set ConcurrencyLevel to the number of concurrent threads you expect to have modifying the map, although overestimating this doesn't appear to have a bad effect other than wasting memory (I wrote a little on ConcurrentHashMap performance a while ago in case you're interested)

Informally, your hash function should essentially aim to have as much "randomness" in the bits as possible. Or more strictly, the hash code for a given element should give each bit a roughly 50% chance of being set. It's actually easier to illustrate this with an example: again, you may be interested in some stuff I wrote about how the String hash function works and associated hash function guidelines. Feedback is obvioulsy welcome on any of this stuff.

One thing I also mention at some point is that you don't have to be too paranoid in practice: if your hash function produces a "reasonable" amount of randomness in some of the bits, then it will often be OK. In the worst case, sticking representative pieces of data into a string and taking the hash code of the string actually doesn't work so badly.