Improving model training speed in caret (R)

r performance machine-learning r-caret

Alexander David · Oct 2, 2015 · Viewed 10k times · Source

I have a dataset consisting of 20 features and roughly 300,000 observations. I'm using caret to train model with doParallel and four cores. Even training on 10% of my data takes well over eight hours for the methods I've tried (rf, nnet, adabag, svmPoly). I'm resampling with with bootstrapping 3 times and my tuneLength is 5. Is there anything I can do to speed up this agonizingly slow process? Someone suggested using the underlying library can speed up my the process as much as 10x, but before I go down that route I'd like to make sure there is no other alternative.

Answer

@phiver hits the nail on the head but, for this situation, there are a few things to suggest:

make sure that you are not exhausting your system memory by using parallel processing. You are making X extra copies of the data in memory when using X workers.
with a class imbalance, additional sampling can help. Downsampling might help improve performance and take less time.
use different libraries. ranger instead of randomForest, xgboost or C5.0 instead of gbm. You should realize that ensemble methods are fitting a ton of constituent models and a bound to take a while to fit.
the package has a racing-type algorithm for tuning parameters in less time
the development version on github has random search methods for the models with a lot of tuning parameters.

Max

Improving model training speed in caret (R)

Answer

Related questions