How to XOR two hex strings so that each byte is XORed separately?

user2535982 picture user2535982 · Jul 1, 2013 · Viewed 19.9k times · Source

I have been posting similar questions here for a couple of days now, but it seems like I was not asking the right thing, so excuse me if I have exhausted you with my XOR questions :D.

To the point - I have two hex strings and I want to XOR these strings such that each byte is XORed separately (i.e. each pair of numbers is XORed separately). And I want to do this in python, and I want to be able to have strings of different lengths. I will do an example manually to illustrate my point (I used the code environment because it allows me to put in spaces where I want them to be):

Input:
s1 = "48656c6c6f"
s2 = "61736b"

Encoding in binary:
48 65 6c 6c 6f = 01001000 01100101 01101100 01101100 01101111
61 73 6b       = 01100001 01110011 01101011

XORing the strings:
01001000 01100101 01101100 01101100 01101111
                  01100001 01110011 01101011
                  00001101 00011111 00000100

Converting the result to hex:
00001101 00011111 00000100 = 0d 1f 04

Output:
0d1f04

So, to summarize, I want to be able to input two hex strings (these will usually be ASCII letters encoded in hex) of different or equal length, and get their XOR such that each byte is XORed separately.

Answer

Martijn Pieters picture Martijn Pieters · Jul 1, 2013

Use binascii.unhexlify() to turn your hex strings to binary data, then XOR that, going back to hex with binascii.hexlify():

>>> from binascii import unhexlify, hexlify
>>> s1 = "48656c6c6f"
>>> s2 = "61736b"
>>> hexlify(''.join(chr(ord(c1) ^ ord(c2)) for c1, c2 in zip(unhexlify(s1[-len(s2):]), unhexlify(s2))))
'0d1f04'

The actual XOR is applied per byte of the decoded data (using ord() and chr() to go to and from integers).

Note that like in your example, I truncated s1 to be the same length as s2 (ignoring characters from the start of s1). You can encode all of s1 with a shorter key s2 by cycling the bytes:

>>> from itertools import cycle
>>> hexlify(''.join(chr(ord(c1) ^ ord(c2)) for c1, c2 in zip(unhexlify(s1), cycle(unhexlify(s2)))))
'2916070d1c'

You don't have to use unhexlify(), but it is a lot easier than looping over s1 and s2 2 characters at a time and using int(twocharacters, 16) to turn that into integer values for XOR operations.

The Python 3 version of the above is a little lighter; use bytes() instead of str.join() and you can drop the chr() and ord() calls as you get to iterate over integers directly:

>>> from binascii import unhexlify, hexlify
>>> s1 = "48656c6c6f"
>>> s2 = "61736b"
>>> hexlify(bytes(c1 ^ c2 for c1, c2 in zip(unhexlify(s1[-len(s2):]), unhexlify(s2)))) 
b'0d1f04'
>>> from itertools import cycle
>>> hexlify(bytes(c1 ^ c2 for c1, c2 in zip(unhexlify(s1), cycle(unhexlify(s2)))))
b'2916070d1c'