From "Signed Types" on Encoding - Protocol Buffers - Google Code:
ZigZag encoding maps signed integers to unsigned integers so that numbers with a small absolute value (for instance, -1) have a small varint encoded value too. It does this in a way that "zig-zags" back and forth through the positive and negative integers, so that -1 is encoded as 1, 1 is encoded as 2, -2 is encoded as 3, and so on, as you can see in the following table:
Signed Original Encoded As 0 0 -1 1 1 2 -2 3 2147483647 4294967294 -2147483648 4294967295
In other words, each value n is encoded using
(n << 1) ^ (n >> 31)
for sint32s, or
(n << 1) ^ (n >> 63)
for the 64-bit version.
How does (n << 1) ^ (n >> 31)
equal whats in the table? I understand that would work for positives, but how does that work for say, -1? Wouldn't -1 be 1111 1111
, and (n << 1)
be 1111 1110
? (Is bit-shifting on negatives well formed in any language?)
Nonetheless, using the fomula and doing (-1 << 1) ^ (-1 >> 31)
, assuming a 32-bit int, I get 1111 1111
, which is 4 billion, whereas the table thinks I should have 1.
Shifting a negative signed integer to the right copies the sign bit, so that
(-1 >> 31) == -1
Then,
(-1 << 1) ^ (-1 >> 31) = -2 ^ -1
= 1
This might be easier to visualise in binary (8 bit here):
(-1 << 1) ^ (-1 >> 7) = 11111110 ^ 11111111
= 00000001