I am building a CART model and I am trying to tune 2 parameters of rpart - CP and Maxdepth. While the Caret package is working well for one parameter at a time, when both are used it keeps throwing an error and i am not able to figure out why
library(caret)
data(iris)
tc <- trainControl("cv",10)
rpart.grid <- expand.grid(cp=seq(0,0.1,0.01), minsplit=c(10,20))
train(Petal.Width ~ Petal.Length + Sepal.Width + Sepal.Length, data=iris, method="rpart",
trControl=tc, tuneGrid=rpart.grid)
I am getting the following error:
Error in train.default(x, y, weights = w, ...) :
The tuning parameter grid should have columns cp
caret
can't do that with the integrated methods so you are going to have to add your own.
Alternatively, you can try this on mlr
a similar machine learning framework that allows many resampling strategies, tune control methods, and algorithm parameter tuning out of the box. There are many learners already implemented, with several different evaluation metrics to choose from.
In your specific problem, try this example:
library(mlr)
iris.task = classif.task = makeClassifTask(id = "iris-example", data = iris, target = "Species")
resamp = makeResampleDesc("CV", iters = 10L)
lrn = makeLearner("classif.rpart")
control.grid = makeTuneControlGrid()
#you can pass resolution = N if you want the algorithm to
#select N tune params given upper and lower bounds to a NumericParam
#instead of a discrete one
ps = makeParamSet(
makeDiscreteParam("cp", values = seq(0,0.1,0.01)),
makeDiscreteParam("minsplit", values = c(10,20))
)
#you can also check all the tunable params
getParamSet(lrn)
#and the actual tuning, with accuracy as evaluation metric
res = tuneParams(lrn, task = iris.task, resampling = resamp, control = control.grid, par.set = ps, measures = list(acc,timetrain))
opt.grid = as.data.frame(res$opt.path)
print(opt.grid)
mlr
is incredibly versatile: wrapper approach allows one to fuse learners with tuning strategies, pre-processing, filtering and imputation steps, and much more.