Creating a hash from several Java string objects

PNS picture PNS · May 14, 2012 · Viewed 13.4k times · Source

What would be the fastest and more robust (in terms of uniqueness) way for implementing a method like

public abstract String hash(String[] values);

The values[] array has 100 to 1,000 members, each of a which with few dozen characters, and the method needs to be run about 10,000 times/sec on a different values[] array each time.

Should a long string be build using a StringBuilder buffer and then a hash method invoked on the buffer contents, or is it better to keep invoking the hash method for each string from values[]?

Obviously a hash of at least 64 bits is needed (e.g., MD5) to avoid collisions, but is there anything simpler and faster that could be done, at the same quality?

For example, what about

public String hash(String[] values)
{
    long result = 0;

    for (String v:values)
    {
        result += v.hashCode();
    }

    return String.valueOf(result);
}

Answer

Marko Topolnik picture Marko Topolnik · May 14, 2012

Definitely don't use plain addition due to its linearity properties, but you can modify your code just slightly to achieve very good dispersion.

public String hash(String[] values) {
  long result = 17;
  for (String v:values) result = 37*result + v.hashCode();
  return String.valueOf(result);
}