apply() is slow - how to make it faster or what are my alternatives?

Question 1

apply() is slow - how to make it faster or what are my alternatives?

r apply r-faq

aplavin · Dec 20, 2012 · Viewed 12.7k times · Source

Answer

Answer

What about with(my_data,sqrt(x^2+y^2)) ?

set.seed(101)
d <- data.frame(x=runif(1e5),y=runif(1e5))

library(rbenchmark)

Two different per-line functions, one taking advantage of vectorization:

hypot <- function(x) sqrt(x[1]^2+x[2]^2)
hypot2 <- function(x) sqrt(sum(x^2))

Try compiling these too:

library(compiler)
chypot <- cmpfun(hypot)
chypot2 <- cmpfun(hypot2)

benchmark(sqrt(d[,1]^2+d[,2]^2),
          with(d,sqrt(x^2+y^2)),
          apply(d,1,hypot),
          apply(d,1,hypot2),
          apply(d,1,chypot),
          apply(d,1,chypot2),
          replications=50)

Results:

                       test replications elapsed relative user.self sys.self
5       apply(d, 1, chypot)           50  61.147  244.588    60.480    0.172
6      apply(d, 1, chypot2)           50  33.971  135.884    33.658    0.172
3        apply(d, 1, hypot)           50  63.920  255.680    63.308    0.364
4       apply(d, 1, hypot2)           50  36.657  146.628    36.218    0.260
1 sqrt(d[, 1]^2 + d[, 2]^2)           50   0.265    1.060     0.124    0.144
2  with(d, sqrt(x^2 + y^2))           50   0.250    1.000     0.100    0.144

As expected the with() solution and the column-indexing solution à la Tyler Rinker are essentially identical; hypot2 is twice as fast as the original hypot (but still about 150 times slower than the vectorized solutions). As already pointed out by the OP, compilation doesn't help very much.

Question 2

I have a quite large data frame, about 10 millions of rows. It has columns x and y, and what I want is to compute

hypot <- function(x) {sqrt(x[1]^2 + x[2]^2)}

for each row. Using apply it would take a lot of time (about 5 minutes, interpolating from lower sizes) and memory.

But it seems to be too much for me, so I've tried different things:

compiling the hypot function reduces the time by about 10%
using functions from plyr greatly increases the running time.

What's the fastest way to do this thing?

apply() is slow - how to make it faster or what are my alternatives?

Answer

Related questions