Persistent hashcode for strings

HugoRune picture HugoRune · Apr 25, 2016 · Viewed 7.3k times · Source

I want to generate an integer hashcode for strings, that will stay constant forever; i.e. the same string should always result in the same hashcode.

The hash does not have to be cryptographically secure, it will not be used for passwords or sensitive data.

My first attempt was to use the .net framework string.GetHashCode() function. However upon reading the sources I found the following commment:

// We want to ensure we can change our hash function daily. 
// This is perfectly fine as long as you don't persist the
// value from GetHashCode to disk or count on String A 
// hashing before string B.  Those are bugs in your code.
hash1 ^= ThisAssembly.DailyBuildNumber;

This seems to indicate that the hashcode will not remain constant.

If so, does the framework have another method to generate repeatable hashcodes? Or would the code from GetHashCode be a reasonable starting point to implement my own?

I am looking for something as lightweight and fast as possible.
I found System.Security.Cryptography.MD5, but that seems overkill for a simple int32 hashcode, and I am worried about the overhead. At the very least it would require conversion from string to byte array, and from byte array to int, and either creation of a new MD5() object for each hash, or management of some static shared MD5 object().

Answer

Scott Chamberlain picture Scott Chamberlain · Apr 25, 2016

There is no built in, cross version stable, way to get a hash code of a string.

You could just copy the existing GetHashCode() code but exclude the portion that adds the build number as the seed and don't use unsafe calls to keep yourself safe from implementation detail changes.

Here is a fully managed version of the 64bit GetHashCode() that does not use any randomization and will return the same value for all future versions of .NET (as long as the behavior of int ^ char never changes).

public static class StringExtensionMethods
{
    public static int GetStableHashCode(this string str)
    {
        unchecked
        {
            int hash1 = 5381;
            int hash2 = hash1;

            for(int i = 0; i < str.Length && str[i] != '\0'; i += 2)
            {
                hash1 = ((hash1 << 5) + hash1) ^ str[i];
                if (i == str.Length - 1 || str[i+1] == '\0')
                    break;
                hash2 = ((hash2 << 5) + hash2) ^ str[i+1];
            }

            return hash1 + (hash2*1566083941);
        }
    }
}