HashMap Java 8 implementation

Hasnain Ali Bohra picture Hasnain Ali Bohra · May 11, 2017 · Viewed 23.5k times · Source

As per the following link document: Java HashMap Implementation

I'm confused with the implementation of HashMap (or rather, an enhancement in HashMap). My queries are:

Firstly

static final int TREEIFY_THRESHOLD = 8;
static final int UNTREEIFY_THRESHOLD = 6;
static final int MIN_TREEIFY_CAPACITY = 64;

Why and how are these constants used? I want some clear examples for this. How they are achieving a performance gain with this?

Secondly

If you see the source code of HashMap in JDK, you will find the following static inner class:

static final class TreeNode<K, V> extends java.util.LinkedHashMap.Entry<K, V> {
    HashMap.TreeNode<K, V> parent;
    HashMap.TreeNode<K, V> left;
    HashMap.TreeNode<K, V> right;
    HashMap.TreeNode<K, V> prev;
    boolean red;

    TreeNode(int arg0, K arg1, V arg2, HashMap.Node<K, V> arg3) {
        super(arg0, arg1, arg2, arg3);
    }

    final HashMap.TreeNode<K, V> root() {
        HashMap.TreeNode arg0 = this;

        while (true) {
            HashMap.TreeNode arg1 = arg0.parent;
            if (arg0.parent == null) {
                return arg0;
            }

            arg0 = arg1;
        }
    }
    //...
}

How is it used? I just want an explanation of the algorithm.

Answer

Michael picture Michael · May 11, 2017

HashMap contains a certain number of buckets. It uses hashCode to determine which bucket to put these into. For simplicity's sake imagine it as a modulus.

If our hashcode is 123456 and we have 4 buckets, 123456 % 4 = 0 so the item goes in the first bucket, Bucket 1.

HashMap

If our hashcode function is good, it should provide an even distribution so all the buckets will be used somewhat equally. In this case, the bucket uses a linked list to store the values.

Linked Buckets

But you can't rely on people to implement good hash functions. People will often write poor hash functions which will result in a non-even distribution. It's also possible that we could just get unlucky with our inputs.

Bad hashmap

The less even this distribution is, the further we're moving from O(1) operations and the closer we're moving towards O(n) operations.

The implementation of Hashmap tries to mitigate this by organising some buckets into trees rather than linked lists if the buckets become too large. This is what TREEIFY_THRESHOLD = 8 is for. If a bucket contains more than eight items, it should become a tree.

Tree Bucket

This tree is a Red-Black tree. It is first sorted by hash code. If the hash codes are the same, it uses the compareTo method of Comparable if the objects implement that interface, else the identity hash code.

If entries are removed from the map, the number of entries in the bucket might reduce such that this tree structure is no longer necessary. That's what the UNTREEIFY_THRESHOLD = 6 is for. If the number of elements in a bucket drops below six, we might as well go back to using a linked list.

Finally, there is the MIN_TREEIFY_CAPACITY = 64.

When a hash map grows in size, it automatically resizes itself to have more buckets. If we have a small hash map, the likelihood of us getting very full buckets is quite high, because we don't have that many different buckets to put stuff into. It's much better to have a bigger hash map, with more buckets that are less full. This constant basically says not to start making buckets into trees if our hash map is very small - it should resize to be larger first instead.


To answer your question about the performance gain, these optimisations were added to improve the worst case. I'm only speculating but you would probably only see a noticeable performance improvement because of these optimisations if your hashCode function was not very good.