I have a matrix and look for an efficient way to replicate it n times (where n is the number of observations in the dataset). For example, if I have a matrix A
A <- matrix(1:15, nrow=3)
then I want an output of the form
rbind(A, A, A, ...) #n times
.
Obviously, there are many ways to construct such a large matrix, for example using a for
loop or apply
or similar functions. However, the call to the "matrix-replication-function" takes place in the very core of my optimization algorithm where it is called tens of thousands of times during one run of my program. Therefore, loops, apply-type of functions and anything similar to that are not efficient enough. (Such a solution would basically mean that a loop over n is performed tens of thousands of times, which is obviously inefficient.) I already tried to use the ordinary rep
function, but haven't found a way to arrange the output of rep
in a matrix of the desired format.
The solution
do.call("rbind", replicate(n, A, simplify=F))
is also too inefficient because rbind
is used too often in this case. (Then, about 30% of the total runtime of my program are spent performing the rbinds.)
Does anyone know a better solution?
Two more solutions:
The first is a modification of the example in the question
do.call("rbind", rep(list(A), n))
The second involves unrolling the matrix, replicating it, and reassembling it.
matrix(rep(t(A),n), ncol=ncol(A), byrow=TRUE)
Since efficiency is what was requested, benchmarking is necessary
library("rbenchmark")
A <- matrix(1:15, nrow=3)
n <- 10
benchmark(rbind(A, A, A, A, A, A, A, A, A, A),
do.call("rbind", replicate(n, A, simplify=FALSE)),
do.call("rbind", rep(list(A), n)),
apply(A, 2, rep, n),
matrix(rep(t(A),n), ncol=ncol(A), byrow=TRUE),
order="relative", replications=100000)
which gives:
test replications elapsed
1 rbind(A, A, A, A, A, A, A, A, A, A) 100000 0.91
3 do.call("rbind", rep(list(A), n)) 100000 1.42
5 matrix(rep(t(A), n), ncol = ncol(A), byrow = TRUE) 100000 2.20
2 do.call("rbind", replicate(n, A, simplify = FALSE)) 100000 3.03
4 apply(A, 2, rep, n) 100000 7.75
relative user.self sys.self user.child sys.child
1 1.000 0.91 0 NA NA
3 1.560 1.42 0 NA NA
5 2.418 2.19 0 NA NA
2 3.330 3.03 0 NA NA
4 8.516 7.73 0 NA NA
So the fastest is the raw rbind
call, but that assumes n
is fixed and known ahead of time. If n
is not fixed, then the fastest is do.call("rbind", rep(list(A), n)
. These were for a 3x5 matrix and 10 replications. Different sized matrices might give different orderings.
EDIT:
For n=600, the results are in a different order (leaving out the explicit rbind
version):
A <- matrix(1:15, nrow=3)
n <- 600
benchmark(do.call("rbind", replicate(n, A, simplify=FALSE)),
do.call("rbind", rep(list(A), n)),
apply(A, 2, rep, n),
matrix(rep(t(A),n), ncol=ncol(A), byrow=TRUE),
order="relative", replications=10000)
giving
test replications elapsed
4 matrix(rep(t(A), n), ncol = ncol(A), byrow = TRUE) 10000 1.74
3 apply(A, 2, rep, n) 10000 2.57
2 do.call("rbind", rep(list(A), n)) 10000 2.79
1 do.call("rbind", replicate(n, A, simplify = FALSE)) 10000 6.68
relative user.self sys.self user.child sys.child
4 1.000 1.75 0 NA NA
3 1.477 2.54 0 NA NA
2 1.603 2.79 0 NA NA
1 3.839 6.65 0 NA NA
If you include the explicit rbind
version, it is slightly faster than the do.call("rbind", rep(list(A), n))
version, but not by much, and slower than either the apply
or matrix
versions. So a generalization to arbitrary n
does not require a loss of speed in this case.