I'm looking into using a consistent hash algorithm in some java code I'm writing. The guava Hashing library has a consistentHash(HashCode, int)
method, but the documentation is rather lacking. My initial hope was that I could just use consistentHash()
for simple session affinity to efficiently distribute load across a set of backend servers.
Does anyone have a real-world example of how to use this method? In particular I'm concerned with managing the removal of a bucket from the target range.
For example:
@Test
public void testConsistentHash() {
List<String> servers = Lists.newArrayList("server1", "server2", "server3", "server4", "server5");
int bucket = Hashing.consistentHash(Hashing.md5().hashString("someId"), servers.size());
System.out.println("First time routed to: " + servers.get(bucket));
// one of the back end servers is removed from the (middle of the) pool
servers.remove(1);
bucket = Hashing.consistentHash(Hashing.md5().hashString("blah"), servers.size());
System.out.println("Second time routed to: " + servers.get(bucket));
}
Leads to the output:
First time routed to: server4 Second time routed to: server5
What I want is for that identifier ("someId") to map to the same server after removal of a server earlier in the list. So in the sample above, after removal I guess I'd want bucket 0 to map to "server1", bucket 1 to map to "server3", bucket 2 to map to "server4" and bucket 3 to map to "server5".
Am I supposed to maintain a separate (more complicated than a list) data structure to manage bucket removal and addition? I guess I had envisioned perhaps a more complicated Hashing API that would manage the remapping after adding and removal of particular buckets for me.
Note: I know the sample code is using a small input and bucket set. I tried this with 1000s of input across 100 buckets and the result is the same. Inputs that map to buckets 0-98 stay the same when I change the buckets
to 99 and bucket 99 gets distributed across the remaining 99 buckets.
I don't think there's a good way to do this at the moment. consistentHash
in its current form is useful only in simple cases -- basically, where you have a knob to increase or decrease the number of servers... but always by adding and removing at the end.
There's some work underway to add a class like this:
public final class WeightedConsistentHash<B, I> {
/** Initially, all buckets have weight zero. */
public static <B, I> WeightedConsistentHash<B, I> create(
Funnel<B> bucketFunnel, Funnel<I> inputFunnel);
/**
* Sets the weight of bucket "bucketId" to "weight".
* Requires "weight" >= 0.0.
*/
public void setBucketWeight(B bucketId, double weight);
/**
* Returns the bucket id that "input" maps to.
* Requires that at least one bucket has a non-zero weight.
*/
public B hash(I input);
}
Then you would write:
WeightedConsistentHash<String, String> serverChooser =
WeightedConsistentHash.create(stringFunnel(), stringFunnel());
serverChooser.setBucketWeight("server1", 1);
serverChooser.setBucketWeight("server2", 1);
// etc.
System.out.println("First time routed to: " + serverChooser.hash("someId"));
// one of the back end servers is removed from the (middle of the) pool
serverChooser.setBucketWeight("server2", 0);
System.out.println("Second time routed to: " + serverChooser.hash("someId"));
And you should get the same server each time. Does that API look suitable?