I have noticed a curious thing whilst working in R. When I have a simple program that computes squares from 1 to N implemented using for-loop and while-loop the behaviour is not the same. (I don't care about vectorisation in this case or apply functions).
fn1 <- function (N)
{
for(i in 1:N) {
y <- i*i
}
}
AND
fn2 <- function (N)
{
i=1
while(i <= N) {
y <- i*i
i <- i + 1
}
}
The results are:
system.time(fn1(60000))
user system elapsed
2.500 0.012 2.493
There were 50 or more warnings (use warnings() to see the first 50)
Warning messages:
1: In i * i : NAs produced by integer overflow
.
.
.
system.time(fn2(60000))
user system elapsed
0.138 0.000 0.137
Now we know that for-loop is faster, my guess is because of pre allocation and optimisations there. But why does it overflow?
UPDATE: So now trying another way with vectors:
fn3 <- function (N)
{
i <- 1:N
y <- i*i
}
system.time(fn3(60000))
user system elapsed
0.008 0.000 0.009
Warning message:
In i * i : NAs produced by integer overflow
So Perhaps its a funky memory issue? I am running on OS X with 4Gb of memory and all default settings in R. This happens in 32- and 64-bit versions (except that times are faster).
Alex
Because 1
is numeric, but not integer (i.e. it's a floating point number), and 1:6000
is numeric and integer.
> print(class(1))
[1] "numeric"
> print(class(1:60000))
[1] "integer"
60000 squared is 3.6 billion, which is NOT representable in signed 32-bit integer, hence you get an overflow error:
> as.integer(60000)*as.integer(60000)
[1] NA
Warning message:
In as.integer(60000) * as.integer(60000) : NAs produced by integer overflow
3.6 billion is easily representable in floating point, however:
> as.single(60000)*as.single(60000)
[1] 3.6e+09
To fix your for
code, convert to a floating point representation:
function (N)
{
for(i in as.single(1:N)) {
y <- i*i
}
}