Convert unique numbers to md5 hash using pandas

Dave picture Dave · Feb 23, 2015 · Viewed 9k times · Source

Good morning, All.

I want to convert my social security numbers to a md5 hash hex number. The outcome should be a unique md5 hash hex number for each social security number.

My data format is as follows:

ob = onboard[['regions','lname','ssno']][:10]
ob

    regions lname   ssno
0    Northern Region (R1)    Banderas    123456789
1    Northern Region (R1)    Garfield    234567891
2    Northern Region (R1)    Pacino  345678912
3    Northern Region (R1)    Baldwin     456789123
4    Northern Region (R1)    Brody   567891234
5    Northern Region (R1)    Johnson     6789123456
6    Northern Region (R1)    Guinness    7890123456
7    Northern Region (R1)    Hopkins     891234567
8    Northern Region (R1)    Paul    891234567
9    Northern Region (R1)    Arkin   987654321

I've tried the following code using hashlib:

import hashlib

ob['md5'] = hashlib.md5(['ssno'])

This gave me the error that it had to be a string not a list. So I tried the following:

ob['md5'] = hashlib.md5('ssno').hexdigest()



regions lname   ssno    md5
0    Northern Region (R1)    Banderas    123456789   a1b3ec3d8a026d392ad551701ad7881e
1    Northern Region (R1)    Garfield    234567891   a1b3ec3d8a026d392ad551701ad7881e
2    Northern Region (R1)    Pacino  345678912   a1b3ec3d8a026d392ad551701ad7881e
3    Northern Region (R1)    Baldwin     456789123   a1b3ec3d8a026d392ad551701ad7881e
4    Northern Region (R1)    Brody   567891234   a1b3ec3d8a026d392ad551701ad7881e
5    Northern Region (R1)    Johnson     678912345   a1b3ec3d8a026d392ad551701ad7881e
6    Northern Region (R1)    Johnson     789123456   a1b3ec3d8a026d392ad551701ad7881e
7    Northern Region (R1)    Guiness     891234567   a1b3ec3d8a026d392ad551701ad7881e
8    Northern Region (R1)    Hopkins     912345678   a1b3ec3d8a026d392ad551701ad7881e
9    Northern Region (R1)    Paul    159753456   a1b3ec3d8a026d392ad551701ad7881e

This was very close to what I need but all the hex numbers came out the same regardless if the social security number was different or not. I am trying to get a hex number with unique hex numbers for each social security number.

Any suggestions?

Answer

unutbu picture unutbu · Feb 23, 2015

hashlib.md5 takes a single string as input -- you can't pass it an array of values as you can with some NumPy/Pandas functions. So instead, you could use a list comprehension to build a list of md5sums:

ob['md5'] = [hashlib.md5(val).hexdigest() for val in ob['ssno']]