Fast calculate hamming distance in C

haneefmubarak picture haneefmubarak · Aug 2, 2014 · Viewed 7.2k times · Source

I read the Wikipedia article on Hamming Weight and noticed something interesting:

It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the most typical case, a string of bits, this is the number of 1's in the string. In this binary case, it is also called the population count, popcount or sideways sum.

[emphasis mine]

So something occurred to me. Could I calculate the Hamming Distance between two strings by XORing them and then taking the Hamming Weight (POPCOUNT) of the resulting string?

Something along the lines of this (using gcc intrinsics):

#include <stdint.h>

int hammingDistance (uint64_t x, uint64_t y) {
        uint64_t res = x ^ y;
        return __builtin_popcountll (res);
}

Now, as for why I would want to do this, well, on some platforms, yes, this would just translate to gcc emitting a call to a function that calculates popcount. For instance, on x64 without popcnt, gcc spits out (Godbolt's GCC Online):

hammingDistance:
    sub rsp, 8
    xor rdi, rsi
    call    __popcountdi2
    add rsp, 8
    ret

OTOH, if you have a platform that supports POPCOUNT, like x64 models including nehalem and after (which have POPCNT), you get (Godbolt's GCC Online):

hammingDistance:
    xor rdi, rsi
    popcnt  rax, rdi
    ret

which should be waaay faster, especially once inlined.


But back to the original question. Can you take the Hamming Weight of the XOR of two strings to find their Hamming Distance? ie:

HD = HW (x xor y)

Answer

Pradhan picture Pradhan · Aug 2, 2014

Hamming distance between two equal length strings, x and y, is defined to be the number of positions where they differ. In the case of x and y being bitstrings, x^y is a string with 1s in exactly the positions they differ. Thus, HammingDistance(x,y) = Number of 1s in x^y, for bitstrings. Also, HammingWeight(x) = number of 1s in x for a bitstring x. Thus, your first claim, HammingDistance(x,y) = HammingWeight(x^y) is true for bitstrings. Having established that, it is clear that your implementation is correct.