In my embedded systems class, we were asked to re-code the given C-function AbsVal into ARM Assembly. We were told that the best we could do was 3-lines. I was determined to find a 2-line solution and eventually did, but the question I have now is whether I actually decreased performance or increased it.
The C-code:
unsigned long absval(signed long x){
unsigned long int signext;
signext = (x >= 0) ? 0 : -1; //This can be done with an ASR instruction
return (x + signet) ^ signext;
}
The TA/Professor's 3-line solution
ASR R1, R0, #31 ; R1 <- (x >= 0) ? 0 : -1
ADD R0, R0, R1 ; R0 <- R0 + R1
EOR R0, R0, R1 ; R0 <- R0 ^ R1
My 2-line solution
ADD R1, R0, R0, ASR #31 ; R1 <- x + (x >= 0) ? 0 : -1
EOR R0, R1, R0, ASR #31 ; R0 <- R1 ^ (x >= 0) ? 0 : -1
There are a couple of places I can see potential performance differences:
So, which one is actually faster? Does it depend upon the processor or memory access speed?
Here is a nother two instruction version:
cmp r0, #0
rsblt r0, r0, #0
Which translate to the simple code:
if (r0 < 0)
{
r0 = 0-r0;
}
That code should be pretty fast, even on modern ARM-CPU cores like the Cortex-A8 and A9.