How should I use Guava's Hashing#consistentHash?

GamingBuck picture GamingBuck · Sep 7, 2012 · Viewed 10.1k times · Source

I'm looking into using a consistent hash algorithm in some java code I'm writing. The guava Hashing library has a consistentHash(HashCode, int) method, but the documentation is rather lacking. My initial hope was that I could just use consistentHash() for simple session affinity to efficiently distribute load across a set of backend servers.

Does anyone have a real-world example of how to use this method? In particular I'm concerned with managing the removal of a bucket from the target range.

For example:

@Test
public void testConsistentHash() {
    List<String> servers = Lists.newArrayList("server1", "server2", "server3", "server4", "server5");

    int bucket = Hashing.consistentHash(Hashing.md5().hashString("someId"), servers.size());
    System.out.println("First time routed to: " + servers.get(bucket));

    // one of the back end servers is removed from the (middle of the) pool
    servers.remove(1);

    bucket = Hashing.consistentHash(Hashing.md5().hashString("blah"), servers.size());
    System.out.println("Second time routed to: " + servers.get(bucket));
}

Leads to the output:

First time routed to: server4
Second time routed to: server5

What I want is for that identifier ("someId") to map to the same server after removal of a server earlier in the list. So in the sample above, after removal I guess I'd want bucket 0 to map to "server1", bucket 1 to map to "server3", bucket 2 to map to "server4" and bucket 3 to map to "server5".

Am I supposed to maintain a separate (more complicated than a list) data structure to manage bucket removal and addition? I guess I had envisioned perhaps a more complicated Hashing API that would manage the remapping after adding and removal of particular buckets for me.

Note: I know the sample code is using a small input and bucket set. I tried this with 1000s of input across 100 buckets and the result is the same. Inputs that map to buckets 0-98 stay the same when I change the buckets to 99 and bucket 99 gets distributed across the remaining 99 buckets.

Answer

Chris Povirk picture Chris Povirk · Sep 7, 2012

I don't think there's a good way to do this at the moment. consistentHash in its current form is useful only in simple cases -- basically, where you have a knob to increase or decrease the number of servers... but always by adding and removing at the end.

There's some work underway to add a class like this:

public final class WeightedConsistentHash<B, I> {
  /** Initially, all buckets have weight zero. */
  public static <B, I> WeightedConsistentHash<B, I> create(
      Funnel<B> bucketFunnel, Funnel<I> inputFunnel);

  /**
   * Sets the weight of bucket "bucketId" to "weight".
   * Requires "weight" >= 0.0.
   */
  public void setBucketWeight(B bucketId, double weight);

  /**
   * Returns the bucket id that "input" maps to.
   * Requires that at least one bucket has a non-zero weight.
   */
  public B hash(I input);
}

Then you would write:

WeightedConsistentHash<String, String> serverChooser =
    WeightedConsistentHash.create(stringFunnel(), stringFunnel());
serverChooser.setBucketWeight("server1", 1);
serverChooser.setBucketWeight("server2", 1);
// etc.

System.out.println("First time routed to: " + serverChooser.hash("someId"));

// one of the back end servers is removed from the (middle of the) pool
serverChooser.setBucketWeight("server2", 0);

System.out.println("Second time routed to: " + serverChooser.hash("someId"));

And you should get the same server each time. Does that API look suitable?