I'm looking for a function that will generate an "alphanumeric hash". Given a source string, it produces a determinate result string that can contain any letter a-z or digit 0-9, and cannot be reverse-engineered to produce the source. This will be used to generate passwords for a system based on secret data, so strings between 8 and 12 characters are ideal and a secure hash would also be ideal.
I'm thinking I can use a normal bitwise hash, XOR-fold it to 64 bits (if I use, for instance, SHA256) and then take the result 5 bits at a time (producing a number 0-31) and look up the character code to use from an indexed ordered collection. There are 26 letters and 10 digits meaning I'll have to leave a few out (probably removing characters that could be mistaken for others if handwritten). 64 bits, 5 bits at a time, will produce a 12-character string with 4 bits left over.
However, I'm worried about two things: first, introducing bias by taking a non-power-of-2 number of bits; and second, what to do with the leftover bits. Do I use them as-is knowing there will only be 16 possibilities, do I leave them off (and lose data possibly introducing bias), or do I incorporate one more bit to make a 13-character string (and where should the last bit come from)?
EDIT: Here's my current stab at it; it takes an enumerable of bytes (like the byte array produced by most hash algorithms) and returns a string:
/// <summary>
/// Converts an IEnumerable of bytes to a string representation which can have any lowercase letter a-z except for l, o, q and z, and any digit 0-9.
/// Uses 5 bits of the byte array at a time to generate numbers from 0 to 31, which are then translated to letters or numbers.
/// </summary>
/// <param name="toConvert">the byte array to convert.</param>
/// <returns>A string containing the alphanumeric case-insensitive representation of the bytes in the array.</returns>
public static string ToInsensitiveAlphaNumericString(this IEnumerable<byte> toConvert)
{
var chars = new[]
{
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'm', 'n', 'p', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
};
var enumerator = toConvert.GetEnumerator();
enumerator.MoveNext();
int buffer = enumerator.Current;
short bufferLength = 8;
const int valueLength = 5;
var builder = new StringBuilder();
while (true)
{
var value = buffer >> (bufferLength - valueLength);
builder.Append(chars[value]);
buffer = buffer - (value << (bufferLength - valueLength));
bufferLength -= valueLength;
if(bufferLength < valueLength )
{
if (enumerator.MoveNext())
{
buffer = (buffer << 8) + enumerator.Current;
bufferLength += 8;
}
else
{
//here's the main question; to include, or not to include?
if (bufferLength > 0)
builder.Append(chars[buffer]);
break;
}
}
}
return builder.ToString();
}