'Correct' unsigned integer comparison

Manu Evans picture Manu Evans · May 19, 2017 · Viewed 8.7k times · Source

So, we all know the C/C++ signed/unsigned comparison rules where -1 > 2u == true, and I have a situation where I want to implement 'correct' comparisons efficiently.

My question is, which is more efficient with considerations to as many architectures as people are familiar with. Obviously Intel and ARM have higher weight.

Given:

int x;
unsigned int y;
if (x < y) {}

Is it better to promote:

x < y  =>  (int64)x < (int64)y

or is it better to perform 2 comparisons, ie:

x < y  =>  (x < 0 || x < y)

The former implies a zero-extend, sign-extend, and one comparison+branch, and latter requires no sign-extend operations, but 2 consecutive cmp+branches.
Traditional wisdom suggests that branches are more expensive than sign extends, which will both pipeline, but there is a stall between the extends and the single comparison in the first case, whereas in the second case I can imagine that some architectures might pipeline the 2 comparisons, but then followed by 2 conditional branches?

Another case exists, where the unsigned value is a smaller type than the signed type, which means it can be done with a single zero-extend to the signed type's length, and then a single comparison... in that case, is it preferable to use the extend+cmp version, or is the 2-comparison method still preferred?

Intel? ARM? Others? I'm not sure if there's a right answer here, but I'd like to hear peoples take's. Low-level performance is hard to predict these days, especially on Intel and increasingly so on ARM.

Edit:

I should add, there is one obvious resolution, where the types are sized equal to the architecture int width; in that case it is obvious that the 2-comparison solution is preferred, since the promotion itself can't be performed efficiently. Clearly my int example meets this condition for 32-bit architectures, and you can transpose the thought experiment to short for the exercise applied to 32bit platforms.

Edit 2:

Sorry, I forgot the u in -1 > 2u! >_<

Edit 3:

I want to amend the situation to assume that the result of the comparison an actual branch, and the result is NOT returned as a boolean. This is how I'd prefer the structure look; although this does raise an interesting point that there is another set of permutations when the result is a bool vs a branch.

int g;
void fun(int x, unsigned in y) { if((long long)x < (long long)y) g = 10; }
void gun(int x, unsigned in y) { if(x < 0 || x < y) g = 10; }

This produces the intended branch typically implied when you encounter an if ;)

Answer

Sneftel picture Sneftel · May 19, 2017

Well, you've correctly typified the situation: C/C++ have no way of doing a full signed int/unsigned int comparison with a single compare.

I would be surprised if promotion to int64 was faster than doing two comparisons. In my experience, compilers are quite good at realizing that a subexpression like that is pure (has no side effects) and thus that there's no need for a second branch. (You can also explicitly opt-out of short circuiting using bitwise-or: (x < 0) | (x < y).) In contrast, my experience is that compilers tend NOT to do much special-case optimization on integers greater than the native word size, so (int64)x < (int64)y is quite likely to actually do a full int comparison.

Bottom line, there's no incantation which is guaranteed to produce the best possible machine code on any processor, but for the most common compilers on the most common processors, I would guess that the two-comparison form would be no slower than the promotion-to-int64 form.

EDIT: Some mucking about on Godbolt confirms that on ARM32, GCC puts way too much machinery into the int64 approach. VC does the same on x86. With x64, though, the int64 approach is actually one instruction shorter (since the promotion and 64-bit comparison are trivial). The pipelining might make the actual performance go either way, though. https://godbolt.org/g/wyG4yC