Dealing with very small numbers in R

dan12345 picture dan12345 · Apr 27, 2011 · Viewed 15.5k times · Source

I need to calculate a list of very small numbers such as

(0.1)^1000, 0.2^(1200),

and then normalize them so they will sum up to one i.e.

a1 = 0.1^1000, a2 = 0.2^1200

And I want to calculate a1' = a1/(a1+a2), a2'=a2(a1+a2).

I'm running into underflow problems, as I get a1=0. How can I get around this? Theoretically I could deal with logs, and then log(a1) = 1000*log(0.l) would be a way to represent a1 without underflow problems - But in order to normalize I would need to get log(a1+a2) - which I can't compute since I can't represent a1 directly.

I'm programming with R - as far as I can tell there is no data type such Decimal in c# which allows you to get better than double-precision value.

Any suggestions will be appreciated, thanks

Answer

Joris Meys picture Joris Meys · Apr 27, 2011

Mathematically spoken, one of those numbers will be appx. zero, and the other one. The difference between your numbers is huge, so I'm even wondering if this makes sense.

But to do that in general, you can use the idea from the logspace_add C-function that's underneath the hood of R. One can define logxpy ( =log(x+y) ) when lx = log(x) and ly = log(y) as :

logxpy <- function(lx,ly) max(lx,ly) + log1p(exp(-abs(lx-ly)))

Which means that we can use :

> la1 <- 1000*log(0.1)
> la2 <- 1200*log(0.2)

> exp(la1 - logxpy(la1,la2))
[1] 5.807714e-162

> exp(la2 - logxpy(la1,la2))
[1] 1

This function can be called recursively as well if you have more numbers. Mind you, 1 is still 1, and not 1 minus 5.807...e-162 . If you really need more precision and your platform supports long double types, you could code everything in eg C or C++, and return the results later on. But if I'm right, R can - for the moment - only deal with normal doubles, so ultimately you'll lose the precision again when the result is shown.


EDIT :

to do the math for you :

log(x+y) = log(exp(lx)+exp(ly))
         = log( exp(lx) * (1 + exp(ly-lx) )
         = lx + log ( 1 + exp(ly - lx)  )

Now you just take the largest as lx, and then you come at the expression in logxpy().

EDIT 2 : Why take the maximum then? Easy, to assure that you use a negative number in exp(lx-ly). If lx-ly gets too big, then exp(lx-ly) would return Inf. That's not a correct result. exp(ly-lx) would return 0, which allows for a far better result:

Say lx=1 and ly=1000, then :

> 1+log1p(exp(1000-1))
[1] Inf
> 1000+log1p(exp(1-1000))
[1] 1000