In my thesis I need to perform a lot of simulation studies, which all takes quite a while. My computer has 4 cores, so I have been wondering if it is possible to run for example two R-scripts in Rstudio at the same time, by letting them use two different cores? If this could be done, I could be saving a lot of time by just leaving the computer over night running all these scripts.
##In RStudio
If you right click on RStudio, you should be able to open several separate "sessions" of RStudio (whether or not you use Projects). By default these will use 1 core each.
Update (July 2018): RStudio v1.2.830-1 which is available as a Preview Release supports a "jobs" pane. This is dedicated to running R scripts in the background separate from the interactive R session:
- Run any R script as a background job in a clean R session
- Monitor progress and see script output in real time
- Optionally give jobs your global environment when started, and export values back when complete
This will be available in RStudio version 1.2.
##Running Scripts in the Terminal
If have several scripts that you know run without errors, I'd recommend running these on different parameters through the command-line:
RCMD script.R
RScript script.R
R --vanilla < script.R
Running in the background:
nohup Rscript script.R &
Here "&" runs the script in the background (it can be retrieved with fg
, monitored with htop
, and killed with kill <pid>
or pkill rsession
) and nohup
saves the output in a file and continues to run if the terminal is closed.
Passing arguments to a script:
Rscript script.R 1 2 3
This will pass c(1, 2, 3)
to R as the output of commandArgs()
so a loop in bash can run multiple instances of Rscript with a bash loop:
for ii in 1 2 3
do
nohup Rscript script.R $ii &
done
##Running parallel code within R
You will often find that a particular step in your R script is slowing computations, may I suggest running parallel code within your R code rather than running them separately? I'd recommend the [snow package][1] for running loops in parallel in R. Generally, instead of use:
cl <- makeCluster(n)
# n = number of cores (I'd recommend one less than machine capacity)
clusterExport(cl, list=ls()) #export input data to all cores
output_list <- parLapply(cl, input_list, function(x) ... )
stopCluster() # close cluster when complete (particularly on shared machines)
Use this anywhere you would normally use a lapply
function in R to run it in parallel.
[1]: https://www.r-bloggers.com/quick-guide-to-parallel-r-with-snow/