I have been working my way through Dirk Eddelbuettel's Rcpp
tutorial here:
http://www.rinfinance.com/agenda/
I have learned how to save a C++ file in a directory and call it and run it from within R. The C++ file I am running is called 'logabs2.ccp' and its contents are directly from one of Dirk's slides:
#include <Rcpp.h>
using namespace Rcpp;
inline double f(double x) { return ::log(::fabs(x)); }
// [[Rcpp::export]]
std::vector<double> logabs2(std::vector<double> x) {
std::transform(x.begin(), x.end(), x.begin(), f);
return x;
}
I run it with this R code:
library(Rcpp)
sourceCpp("c:/users/mmiller21/simple r programs/logabs2.cpp")
logabs2(seq(-5, 5, by=2))
# [1] 1.609438 1.098612 0.000000 0.000000 1.098612 1.609438
I am running the code on a Windows 7 machine from within the R GUI that seems to install by default. I also installed the most recent version of Rtools
. The above R code seems to take a relatively long time to run. I suspect most of that time is devoted to compiling the C++ code and that once the C++ code is compiled it runs very quickly. Microbenchmark
certainly suggests that Rcpp
reduces computation time.
I have never used C++ until now, but I know that when I compile C code I get an *.exe file. I have searched my hard-drive from a file called logabs2.exe
but cannot find one. I am wondering whether the above C++ code might run even faster if a logabs2.exe
file was created. Is it possible to create a logabs2.exe
file and store it in a folder somewhere and then have Rcpp call that file whenever I wanted to use it? I do not know whether that makes sense. If I could store a C++ function in an *.exe file then perhaps I would not have to compile the function every time I wanted to use it with Rcpp and then perhaps the Rcpp code would be even faster.
Sorry if this question does not make sense or is a duplicate. If it is possible to store the C++ function as an *.exe file I am hoping someone will show me how to modify my R code above to run it. Thank you for any help with this or for setting me straight on why what I suggest is not possible or recommended.
I look forward to seeing Dirk's new book.
Thank you to user1981275, Dirk Eddelbuettel and Romain Francois for their responses. Below is how I compiled a C++ file and created a *.dll, then called and used that *.dll file inside R
.
Step 1. I created a new folder called 'c:\users\mmiller21\myrpackages' and pasted the file 'logabs2.cpp' into that new folder. The file 'logabs2.cpp' was created as described in my original post.
Step 2. Inside the new folder I created a new R
package called 'logabs2' using an R
file I wrote called 'new package creation.r'. The contents of 'new package creation.r' are:
setwd('c:/users/mmiller21/myrpackages/')
library(Rcpp)
Rcpp.package.skeleton("logabs2", example_code = FALSE, cpp_files = c("logabs2.cpp"))
I found the above syntax for Rcpp.package.skeleton
on one of Hadley Wickham's websites: https://github.com/hadley/devtools/wiki/Rcpp
Step 3. I installed the new R
package "logabs2" in R
using the following line in the DOS command window:
C:\Program Files\R\R-3.0.1\bin\x64>R CMD INSTALL -l c:\users\mmiller21\documents\r\win-library\3.0\ c:\users\mmiller21\myrpackages\logabs2
where:
the location of the rcmd.exe file is:
C:\Program Files\R\R-3.0.1\bin\x64>
the location of installed R
packages on my computer is:
c:\users\mmiller21\documents\r\win-library\3.0\
and the location of my new R
package prior to being installed is:
c:\users\mmiller21\myrpackages\
Syntax used in the DOS command window was found by trial and error and may not be ideal. At some point I pasted a copy of 'logabs2.cpp' in 'C:\Program Files\R\R-3.0.1\bin\x64>' but I do not think that mattered.
Step 4. After installing the new R
package I ran it using an R
file I named 'new package usage.r' in the 'c:/users/mmiller21/myrpackages/' folder (although I do not think the folder was important). The contents of 'new package usage.r' are:
library(logabs2)
logabs2(seq(-5, 5, by=2))
The output was:
# [1] 1.609438 1.098612 0.000000 0.000000 1.098612 1.609438
This file loaded the package Rcpp
without me asking.
In this case base R
was faster assuming I did this correctly.
#> microbenchmark(logabs2(seq(-5, 5, by=2)), times = 100)
#Unit: microseconds
# expr min lq median uq max neval
# logabs2(seq(-5, 5, by = 2)) 43.086 44.453 50.6075 69.756 190.803 100
#> microbenchmark(log(abs(seq(-5, 5, by=2))), times=100)
#Unit: microseconds
# expr min lq median uq max neval
# log(abs(seq(-5, 5, by = 2))) 38.298 38.982 39.666 40.35 173.023 100
However, using the dll file was faster than calling the external cpp file:
system.time(
cppFunction("
NumericVector logabs(NumericVector x) {
return log(abs(x));
}
")
)
# user system elapsed
# 0.06 0.08 5.85
Although base R seems faster or as fast as the *.dll file in this case, I have no doubt that using the *.dll file with Rcpp
will be faster than base R
in most cases.
This was my first attempt creating an R package or using Rcpp and no doubt I did not use the most efficient methods. Also, I apologize for any typographic errors in this post.
EDIT
In a comment below I think Romain Francois suggested I modify the *.cpp file to the following:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector logabs(NumericVector x) {
return log(abs(x));
}
and recreate my R
package, which I have now done. I then compared base R
against my new package using the following code:
library(logabs)
logabs(seq(-5, 5, by=2))
log(abs(seq(-5, 5, by=2)))
library(microbenchmark)
microbenchmark(logabs(seq(-5, 5, by=2)), log(abs(seq(-5, 5, by=2))), times = 100000)
Base R
is still a tiny bit faster or no different:
Unit: microseconds
expr min lq median uq max neval
logabs(seq(-5, 5, by = 2)) 42.401 45.137 46.505 69.073 39754.598 1e+05
log(abs(seq(-5, 5, by = 2))) 37.614 40.350 41.718 62.234 3422.133 1e+05
Perhaps this is because base R
is already vectorized. I suspect with more complex functions base R
will be much slower. Or perhaps I am still not using the most efficient approach, or perhaps I simply made an error somewhere.