I am currently writing some glsl like vector math classes in C++, and I just implemented an abs()
function like this:
template<class T>
static inline T abs(T _a)
{
return _a < 0 ? -_a : _a;
}
I compared its speed to the default C++ abs
from math.h
like this:
clock_t begin = clock();
for(int i=0; i<10000000; ++i)
{
float a = abs(-1.25);
};
clock_t end = clock();
unsigned long time1 = (unsigned long)((float)(end-begin) / ((float)CLOCKS_PER_SEC/1000.0));
begin = clock();
for(int i=0; i<10000000; ++i)
{
float a = myMath::abs(-1.25);
};
end = clock();
unsigned long time2 = (unsigned long)((float)(end-begin) / ((float)CLOCKS_PER_SEC/1000.0));
std::cout<<time1<<std::endl;
std::cout<<time2<<std::endl;
Now the default abs takes about 25ms while mine takes 60. I guess there is some low level optimisation going on. Does anybody know how math.h
abs
works internally? The performance difference is nothing dramatic, but I am just curious!
Since they are the implementation, they are free to make as many assumptions as they want. They know the format of the double
and can play tricks with that instead.
Likely (as in almost not even a question), your double
is the binary64 format. This means the sign has it's own bit, and an absolute value is merely clearing that bit. For example, as a specialization, a compiler implementer may do the following:
template <>
double abs<double>(const double x)
{
// breaks strict aliasing, but compiler writer knows this behavior for the platform
uint64_t i = reinterpret_cast<const std::uint64_t&>(x);
i &= 0x7FFFFFFFFFFFFFFFULL; // clear sign bit
return reinterpret_cast<const double&>(i);
}
This removes branching and may run faster.