Google Protocol Buffers: ZigZag Encoding

Thanatos picture Thanatos · Dec 26, 2010 · Viewed 7.4k times · Source

From "Signed Types" on Encoding - Protocol Buffers - Google Code:

ZigZag encoding maps signed integers to unsigned integers so that numbers with a small absolute value (for instance, -1) have a small varint encoded value too. It does this in a way that "zig-zags" back and forth through the positive and negative integers, so that -1 is encoded as 1, 1 is encoded as 2, -2 is encoded as 3, and so on, as you can see in the following table:

Signed Original  Encoded As
0                0
-1               1
1                2
-2               3
2147483647       4294967294
-2147483648      4294967295

In other words, each value n is encoded using

(n << 1) ^ (n >> 31)

for sint32s, or

(n << 1) ^ (n >> 63)

for the 64-bit version.

How does (n << 1) ^ (n >> 31) equal whats in the table? I understand that would work for positives, but how does that work for say, -1? Wouldn't -1 be 1111 1111, and (n << 1) be 1111 1110? (Is bit-shifting on negatives well formed in any language?)

Nonetheless, using the fomula and doing (-1 << 1) ^ (-1 >> 31), assuming a 32-bit int, I get 1111 1111, which is 4 billion, whereas the table thinks I should have 1.

Answer

Greg Hewgill picture Greg Hewgill · Dec 26, 2010

Shifting a negative signed integer to the right copies the sign bit, so that

(-1 >> 31) == -1

Then,

(-1 << 1) ^ (-1 >> 31) = -2 ^ -1
                       = 1

This might be easier to visualise in binary (8 bit here):

(-1 << 1) ^ (-1 >> 7) = 11111110 ^ 11111111
                      = 00000001