Fast fixed point pow, log, exp and sqrt

Goz picture Goz · Jan 11, 2011 · Viewed 26.9k times · Source

I've got a fixed point class (10.22) and I have a need of a pow, a sqrt, an exp and a log function.

Alas I have no idea where to even start on this. Can anyone provide me with some links to useful articles or, better yet, provide me with some code?

I'm assuming that once I have an exp function then it becomes relatively easy to implement pow and sqrt as they just become.

pow( x, y ) => exp( y * log( x ) )
sqrt( x )   => pow( x, 0.5 )

Its just those exp and log functions that I'm finding difficult (as though I remember a few of my log rules, I can't remember much else about them).

Presumably, there would also be a faster method for sqrt and pow so any pointers on that front would be appreciated even if its just to say use the methods i outline above.

Please note: This HAS to be cross platform and in pure C/C++ code so I cannot use any assembler optimisations.

Answer

MSalters picture MSalters · Jan 11, 2011

A very simple solution is to use a decent table-driven approximation. You don't actually need a lot of data if you reduce your inputs correctly. exp(a)==exp(a/2)*exp(a/2), which means you really only need to calculate exp(x) for 1 < x < 2. Over that range, a runga-kutta approximation would give reasonable results with ~16 entries IIRC.

Similarly, sqrt(a) == 2 * sqrt(a/4) == sqrt(4*a) / 2 which means you need only table entries for 1 < a < 4. Log(a) is a bit harder: log(a) == 1 + log(a/e). This is a rather slow iteration, but log(1024) is only 6.9 so you won't have many iterations.

You'd use a similar "integer-first" algorithm for pow: pow(x,y)==pow(x, floor(y)) * pow(x, frac(y)). This works because pow(double, int) is trivial (divide and conquer).

[edit] For the integral component of log(a), it may be useful to store a table 1, e, e^2, e^3, e^4, e^5, e^6, e^7 so you can reduce log(a) == n + log(a/e^n) by a simple hardcoded binary search of a in that table. The improvement from 7 to 3 steps isn't so big, but it means you only have to divide once by e^n instead of n times by e.

[edit 2] And for that last log(a/e^n) term, you can use log(a/e^n) = log((a/e^n)^8)/8 - each iteration produces 3 more bits by table lookup. That keeps your code and table size small. This is typically code for embedded systems, and they don't have large caches.

[edit 3] That's stil not to smart on my side. log(a) = log(2) + log(a/2). You can just store the fixed-point value log2=0.30102999566, count the number of leading zeroes, shift a into the range used for your lookup table, and multiply that shift (integer) by the fixed-point constant log2. Can be as low as 3 instructions.

Using e for the reduction step just gives you a "nice" log(e)=1.0 constant but that's false optimization. 0.30102999566 is just as good a constant as 1.0; both are 32 bits constants in 10.22 fixed point. Using 2 as the constant for range reduction allows you to use a bit shift for a division.

You still get the trick from edit 2, log(a/2^n) = log((a/2^n)^8)/8. Basically, this gets you a result (a + b/8 + c/64 + d/512) * 0.30102999566 - with b,c,d in the range [0,7]. a.bcd really is an octal number. Not a surprise since we used 8 as the power. (The trick works equally well with power 2, 4 or 16.)

[edit 4] Still had an open end. pow(x, frac(y) is just pow(sqrt(x), 2 * frac(y)) and we have a decent 1/sqrt(x). That gives us the far more efficient approach. Say frac(y)=0.101 binary, i.e. 1/2 plus 1/8. Then that means x^0.101 is (x^1/2 * x^1/8). But x^1/2 is just sqrt(x) and x^1/8 is (sqrt(sqrt(sqrt(x))). Saving one more operation, Newton-Raphson NR(x) gives us 1/sqrt(x) so we calculate 1.0/(NR(x)*NR((NR(NR(x))). We only invert the end result, don't use the sqrt function directly.