Running compiled C++ code with Rcpp

Mark Miller picture Mark Miller · Jun 17, 2013 · Viewed 7k times · Source

I have been working my way through Dirk Eddelbuettel's Rcpp tutorial here:

http://www.rinfinance.com/agenda/

I have learned how to save a C++ file in a directory and call it and run it from within R. The C++ file I am running is called 'logabs2.ccp' and its contents are directly from one of Dirk's slides:

#include <Rcpp.h>

using namespace Rcpp;

inline double f(double x) { return ::log(::fabs(x)); }

// [[Rcpp::export]]
std::vector<double> logabs2(std::vector<double> x) {
    std::transform(x.begin(), x.end(), x.begin(), f);
    return x;
}

I run it with this R code:

library(Rcpp)
sourceCpp("c:/users/mmiller21/simple r programs/logabs2.cpp")
logabs2(seq(-5, 5, by=2))
# [1] 1.609438 1.098612 0.000000 0.000000 1.098612 1.609438

I am running the code on a Windows 7 machine from within the R GUI that seems to install by default. I also installed the most recent version of Rtools. The above R code seems to take a relatively long time to run. I suspect most of that time is devoted to compiling the C++ code and that once the C++ code is compiled it runs very quickly. Microbenchmark certainly suggests that Rcpp reduces computation time.

I have never used C++ until now, but I know that when I compile C code I get an *.exe file. I have searched my hard-drive from a file called logabs2.exe but cannot find one. I am wondering whether the above C++ code might run even faster if a logabs2.exe file was created. Is it possible to create a logabs2.exe file and store it in a folder somewhere and then have Rcpp call that file whenever I wanted to use it? I do not know whether that makes sense. If I could store a C++ function in an *.exe file then perhaps I would not have to compile the function every time I wanted to use it with Rcpp and then perhaps the Rcpp code would be even faster.

Sorry if this question does not make sense or is a duplicate. If it is possible to store the C++ function as an *.exe file I am hoping someone will show me how to modify my R code above to run it. Thank you for any help with this or for setting me straight on why what I suggest is not possible or recommended.

I look forward to seeing Dirk's new book.

Answer

Mark Miller picture Mark Miller · Jun 18, 2013

Thank you to user1981275, Dirk Eddelbuettel and Romain Francois for their responses. Below is how I compiled a C++ file and created a *.dll, then called and used that *.dll file inside R.

Step 1. I created a new folder called 'c:\users\mmiller21\myrpackages' and pasted the file 'logabs2.cpp' into that new folder. The file 'logabs2.cpp' was created as described in my original post.

Step 2. Inside the new folder I created a new R package called 'logabs2' using an R file I wrote called 'new package creation.r'. The contents of 'new package creation.r' are:

setwd('c:/users/mmiller21/myrpackages/')

library(Rcpp)

Rcpp.package.skeleton("logabs2", example_code = FALSE, cpp_files = c("logabs2.cpp"))

I found the above syntax for Rcpp.package.skeleton on one of Hadley Wickham's websites: https://github.com/hadley/devtools/wiki/Rcpp

Step 3. I installed the new R package "logabs2" in R using the following line in the DOS command window:

C:\Program Files\R\R-3.0.1\bin\x64>R CMD INSTALL -l c:\users\mmiller21\documents\r\win-library\3.0\ c:\users\mmiller21\myrpackages\logabs2

where:

the location of the rcmd.exe file is:

C:\Program Files\R\R-3.0.1\bin\x64>

the location of installed R packages on my computer is:

c:\users\mmiller21\documents\r\win-library\3.0\

and the location of my new R package prior to being installed is:

c:\users\mmiller21\myrpackages\

Syntax used in the DOS command window was found by trial and error and may not be ideal. At some point I pasted a copy of 'logabs2.cpp' in 'C:\Program Files\R\R-3.0.1\bin\x64>' but I do not think that mattered.

Step 4. After installing the new R package I ran it using an R file I named 'new package usage.r' in the 'c:/users/mmiller21/myrpackages/' folder (although I do not think the folder was important). The contents of 'new package usage.r' are:

library(logabs2)
logabs2(seq(-5, 5, by=2))

The output was:

# [1] 1.609438 1.098612 0.000000 0.000000 1.098612 1.609438

This file loaded the package Rcpp without me asking.

In this case base R was faster assuming I did this correctly.

#> microbenchmark(logabs2(seq(-5, 5, by=2)), times = 100)
#Unit: microseconds
#                        expr    min     lq  median     uq     max neval
# logabs2(seq(-5, 5, by = 2)) 43.086 44.453 50.6075 69.756 190.803   100

#> microbenchmark(log(abs(seq(-5, 5, by=2))), times=100)
#Unit: microseconds
#                         expr    min     lq median    uq     max neval
# log(abs(seq(-5, 5, by = 2))) 38.298 38.982 39.666 40.35 173.023   100

However, using the dll file was faster than calling the external cpp file:

system.time(

cppFunction("
NumericVector logabs(NumericVector x) {
    return log(abs(x));
}
")

)

#   user  system elapsed 
#   0.06    0.08    5.85 

Although base R seems faster or as fast as the *.dll file in this case, I have no doubt that using the *.dll file with Rcpp will be faster than base R in most cases.

This was my first attempt creating an R package or using Rcpp and no doubt I did not use the most efficient methods. Also, I apologize for any typographic errors in this post.

EDIT

In a comment below I think Romain Francois suggested I modify the *.cpp file to the following:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]

NumericVector logabs(NumericVector x) {
return log(abs(x));
}

and recreate my R package, which I have now done. I then compared base R against my new package using the following code:

library(logabs)

logabs(seq(-5, 5, by=2))
log(abs(seq(-5, 5, by=2)))

library(microbenchmark)

microbenchmark(logabs(seq(-5, 5, by=2)), log(abs(seq(-5, 5, by=2))), times = 100000)

Base R is still a tiny bit faster or no different:

Unit: microseconds
                         expr    min     lq median     uq       max neval
   logabs(seq(-5, 5, by = 2)) 42.401 45.137 46.505 69.073 39754.598 1e+05
 log(abs(seq(-5, 5, by = 2))) 37.614 40.350 41.718 62.234  3422.133 1e+05

Perhaps this is because base R is already vectorized. I suspect with more complex functions base R will be much slower. Or perhaps I am still not using the most efficient approach, or perhaps I simply made an error somewhere.