R install packages from Shell

B.Mr.W. picture B.Mr.W. · Nov 18, 2014 · Viewed 15.8k times · Source

I am trying to implement a reducer for Hadoop Streaming using R. However, I need to figure out a way to access certain libraries that are not built in R, dplyr..etc. Based on my research seems like there are two approaches:

(1) In the reducer code, install the required libraries to a temporary folder and they will be disposed when the session is done, like this:

.libPaths(c(.libPaths(), temp <- tempdir()))
install.packages("dplyr", lib=temp, repos='http://cran.us.r-project.org')
library(dplyr)
...

However, this approach will have a dramatic overhead depending on how many libraries you are trying to install. So most of the time will be wasted on installing libraries(sophisticated libraries like dplyr has tons of dependencies which will take minutes to install on a vanilla R session).

So sounds like I need to install it before hand, which leads us to approach2.

(2) My cluster is fairly big. And I have to use some tool like Ansible to make it work. So I prefer to have one Linux shell command to install the library. I have seen R CMD INSTALL... before, however, it feels like will only install packages from source file instead of doing install.packages() in R console, figure out the mirror, pull the source file, install it in one command.

Can anyone show me how to use one command line in shell to non-interactively install a R package? (sorry for this much background knowledge, if anyone thinks I am not even following the right phylosophy, feel free to leave in the comment how this whole cluster R package should be managed.)

Answer

jangorecki picture jangorecki · Sep 3, 2015

tl;dr

Rscript -e 'install.packages("drat", repos="https://cloud.r-project.org")'

You mentioned you are trying to install dplyr into custom lib location on your disk. Be aware that dplyr package does not support that. You can read more in dplyr#4641.


Moreover if you are installing private package published in internal CRAN-like repository (created by drat or tools::write_PACKAGES), you can easily combine repos argument and resolve dependencies from CRAN automatically.

Rscript -e 'install.packages("priv.pkg", repos=c("cran.priv","https://cloud.r-project.org"))'

This is very handy feature of R repositories, although for production use I would recommend to cache packages from CRAN locally, and use those, so you will never be surprised by a breaking changes in your dependencies. For quality information about handling R in production I suggest to look into talk by Wit Jakuczun at WhyR2019 How to make R great for machine learning in (not only) Enterprise: slides, video.