Applying k-fold Cross Validation model using caret package

pman1971 picture pman1971 · Nov 2, 2015 · Viewed 44.3k times · Source

Let me start by saying that I have read many posts on Cross Validation and it seems there is much confusion out there. My understanding of that it is simply this:

  1. Perform k-fold Cross Validation i.e. 10 folds to understand the average error across the 10 folds.
  2. If acceptable then train the model on the complete data set.

I am attempting to build a decision tree using rpart in R and taking advantage of the caret package. Below is the code I am using.

# load libraries
library(caret)
library(rpart)

# define training control
train_control<- trainControl(method="cv", number=10)

# train the model 
model<- train(resp~., data=mydat, trControl=train_control, method="rpart")

# make predictions
predictions<- predict(model,mydat)

# append predictions
mydat<- cbind(mydat,predictions)

# summarize results
confusionMatrix<- confusionMatrix(mydat$predictions,mydat$resp)

I have one question regarding the caret train application. I have read A Short Introduction to the caret Package train section which states during the resampling process the "optimal parameter set" is determined.

In my example have I coded it up correctly? Do I need to define the rpart parameters within my code or is my code sufficient?

Answer

zacdav picture zacdav · Nov 2, 2015

when you perform k-fold cross validation you are already making a prediction for each sample, just over 10 different models (presuming k = 10). There is no need make a prediction on the complete data, as you already have their predictions from the k different models.

What you can do is the following:

train_control<- trainControl(method="cv", number=10, savePredictions = TRUE)

Then

model<- train(resp~., data=mydat, trControl=train_control, method="rpart")

if you want to see the observed and predictions in a nice format you simply type:

model$pred

Also for the second part of your question, caret should handle all the parameter stuff. You can manually try tune parameters if you desire.