How do I get a short hash of a long string using Excel VBA
Whats given
What I have done so far
I thought this SO answer is a good start since it generates a 4-digit Hex-Code (CRC16).
But 4 digits were to little. In my test with 400 strings 20% got a duplicate somewhere else.
The chance to generate a collision is too high.
Sub tester()
For i = 2 To 433
Cells(i, 2) = CRC16(Cells(i, 1))
Next i
End Sub
Function CRC16(txt As String)
Dim x As Long
Dim mask, i, j, nC, Crc As Integer
Dim c As String
Crc = &HFFFF
For nC = 1 To Len(txt)
j = Val("&H" + Mid(txt, nC, 2))
Crc = Crc Xor j
For j = 1 To 8
mask = 0
If Crc / 2 <> Int(Crc / 2) Then mask = &HA001
Crc = Int(Crc / 2) And &H7FFF: Crc = Crc Xor mask
Next j
Next nC
CRC16 = Hex$(Crc)
End Function
How to reproduce
You can copy these 400 test strings from pastebin.
Paste them to column A in a new Excel workbook and execute the code above.
Q: How do I get a string hash which is short enough (12 chars) and long enough to get a small percentage of duplicates.
Maybe others will find this useful.
I have collected some different functions to generate a short hash of a string in VBA.
I don't take credit for the code and all sources are referenced.
=CRC16HASH(A1)
with this Code=CRC16NUMERIC(A1)
with this Code=CRC16TWICE(A1)
with this Code=SHA1TRUNC(A1)
with this Code=BASE64SHA1(A1)
with this CodeHere is my test workbook with all example functions and a big number of test strings.
Feel free to add own functions.