Python shortest unique id from strings

jack picture jack · Jun 19, 2012 · Viewed 19.9k times · Source

I have more than 100 million unique strings (VARCHAR(100) UNIQUE in MySQL database). Now I use the code below to create unique hash from them (VARCHAR(32) UNIQUE) in order to reduct index size of the InnoDB table (a unique index on varchar(100) is roughly 3 times larger than on varchar(32) field).

id = hashlib.md5(str).hexdigest()

Is there any other method to create shorter ids from those strings and make reasonable uniqueness guarantees?

Answer

simplylizz picture simplylizz · Jun 19, 2012

You can save it as integer:

id_ = int(hashlib.md5(your_str).hexdigest(), 16)

Or as binary string:

id_ = hashlib.md5(your_str).digest()