What is the best 32bit hash function for short strings (tag names)?

Andrey Shchekin picture Andrey Shchekin · Feb 28, 2010 · Viewed 50.3k times · Source

What is the best 32bit hash function for relatively short strings?

Strings are tag names that consist of English letters, numbers, spaces and some additional characters (#, $, ., ...). For example: Unit testing, C# 2.0.

I am looking for 'best' as in 'minimal collisions', performance is not important for my goals.

Answer

Nick Dandoulakis picture Nick Dandoulakis · Feb 28, 2010

I'm not sure if it's the best choice, but here is a hash function for strings:

The Practice of Programming (HASH TABLES, pg. 57)

/* hash: compute hash value of string */
unsigned int hash(char *str)
{
   unsigned int h;
   unsigned char *p;

   h = 0;
   for (p = (unsigned char*)str; *p != '\0'; p++)
      h = MULTIPLIER * h + *p;
   return h; // or, h % ARRAY_SIZE;
}

Empirically, the values 31 and 37 have proven to be good choices for the multiplier in a hash function for ASCII strings.