I read the Wikipedia article on Hamming Weight and noticed something interesting:
It is thus equivalent to the
Hamming distance
from the all-zero string of the same length. For the most typical case, a string of bits, this is the number of 1's in the string. In this binary case, it is also called the population count,popcount
or sideways sum.[emphasis mine]
So something occurred to me. Could I calculate the Hamming Distance between two strings by XOR
ing them and then taking the Hamming Weight (POPCOUNT) of the resulting string?
Something along the lines of this (using gcc
intrinsics):
#include <stdint.h>
int hammingDistance (uint64_t x, uint64_t y) {
uint64_t res = x ^ y;
return __builtin_popcountll (res);
}
Now, as for why I would want to do this, well, on some platforms, yes, this would just translate to gcc
emitting a call to a function that calculates popcount
. For instance, on x64 without popcnt
, gcc
spits out (Godbolt's GCC Online):
hammingDistance:
sub rsp, 8
xor rdi, rsi
call __popcountdi2
add rsp, 8
ret
OTOH, if you have a platform that supports POPCOUNT, like x64 models including nehalem
and after (which have POPCNT
), you get (Godbolt's GCC Online):
hammingDistance:
xor rdi, rsi
popcnt rax, rdi
ret
which should be waaay faster, especially once inlined.
But back to the original question. Can you take the Hamming Weight of the XOR of two strings to find their Hamming Distance? ie:
HD = HW (x xor y)
Hamming distance between two equal length strings, x
and y
, is defined to be the number of positions where they differ. In the case of x
and y
being bitstrings, x^y
is a string with 1
s in exactly the positions they differ. Thus, HammingDistance(x,y) = Number of 1s in x^y
, for bitstrings. Also, HammingWeight(x) = number of 1s in x
for a bitstring x
. Thus, your first claim, HammingDistance(x,y) = HammingWeight(x^y)
is true for bitstrings. Having established that, it is clear that your implementation is correct.