I'm trying to think up a good hash function for strings. And I was thinking it might be a good idea to sum up the unicode values for the first five characters in the string (assuming it has five, otherwise stop where it ends). Would that be a good idea, or is it a bad one?
I am doing this in Java, but I wouldn't imagine that would make much of a difference.
Usually hashes wouldn't do sums, otherwise stop
and pots
will have the same hash.
and you wouldn't limit it to the first n characters because otherwise house and houses would have the same hash.
Generally hashs take values and multiply it by a prime number (makes it more likely to generate unique hashes) So you could do something like:
int hash = 7;
for (int i = 0; i < strlen; i++) {
hash = hash*31 + charAt(i);
}