I am in need of a performance-oriented hash function implementation in C++ for a hash table that I will be coding. I looked around already and only found questions asking what's a good hash function "in general". I've considered CRC32 (but where to find good implementation?) and a few cryptography algorithms. My table, though, has very specific requirements.
Here's what the table will be like:
100,000 items max
200,000 capacity (so the load is 0.5)
hashing a 6-character string which is a part of English sentence
examples: "become" "and he" ", not "
The number one priority of my hash table is quick search (retrieval). Quick insertion is not important, but it will come along with quick search. Deletion is not important, and re-hashing is not something I'll be looking into. To handle collisions, I'll be probably using separate chaining as described here. I have already looked at this article, but would like an opinion of those who have handled such task before.
Now assumming you want a hash, and want something blazing fast that would work in your case, because your strings are just 6 chars long you could use this magic:
size_t precision = 2; //change the precision with this
size_t hash(const char* str)
{
return (*(size_t*)str)>> precision;
}
CRC is for slowpokes ;)
Explanation: This works by casting the contents of the string pointer to "look like" a size_t (int32 or int64 based on the optimal match for your hardware). So the contents of the string are interpreted as a raw number, no worries about characters anymore, and you then bit-shift this the precision needed (you tweak this number to the best performance, I've found 2 works well for hashing strings in set of a few thousands).
Also the really neat part is any decent compiler on modern hardware will hash a string like this in 1 assembly instruction, hard to beat that ;)