A similar question was asked however the link in the answer points to random forest example, it doesn't seem to work in my case.
Here is an example what I'm trying to do:
gbmGrid <- expand.grid(interaction.depth = c(5, 9),
n.trees = (1:3)*200,
shrinkage = c(0.05, 0.1))
fitControl <- trainControl(
method = "cv",
number = 3,
classProbs = TRUE)
gbmFit <- train(strong~.-Id-PlayerName, data = train[1:10000,],
method = "gbm",
trControl = fitControl,
verbose = TRUE,
tuneGrid = gbmGrid)
gbmFit
Everything goes fine, I get the best parameters. Now if I do the prediction:
predictStrong = predict(gbmFit, newdata=train[11000:50000,])
I get a binary vector of predictions, which is good:
[1] 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 ...
However when I try to get probabilities, I get an error:
predictStrong = predict(gbmFit, newdata=train[11000:50000,], type="prob")
Error in `[.data.frame`(out, , obsLevels, drop = FALSE) :
undefined columns selected
Where seems to be the problem?
Additional info:
traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(out, , obsLevels, drop = FALSE)
3: out[, obsLevels, drop = FALSE]
2: predict.train(gbmFit, newdata = train[11000:50000, ], type = "prob")
1: predict(gbmFit, newdata = train[11000:50000, ], type = "prob")
Versions:
R version 3.1.0 (2014-04-10) -- "Spring Dance"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)
caret version: 6.0-29
EDIT:
I've seen this topic as well and I don't get an error about variable names, although I have couple of variable names with underscores, which I assume it's valid, as I use make.names
and get the same names as the original.
colnames(train) == make.names(colnames(train))
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
When class probabilities are requested, train
puts them into a data frame with a column for each class. If the factor levels are not valid variable names, they are automatically changed (e.g. "0"
becomes "X0"
). train
issues a warning in this case that goes something like "At least one of the class levels are not valid R variables names. This may cause errors if class probabilities are generated."