I’m a newbie to R, and I’m having trouble with an R predict command. I receive this error
Error in `[.data.frame`(newdata, , as.character(object$formula[[2]])) :
undefined columns selected
when I execute this command:
model.predict <- predict.boosting(model,newdata=test)
Here is my model:
model <- boosting(Y~x1+x2+x3+x4+x5+x6+x7, data=train)
And here is the structure of my test data: str(test)
'data.frame': 343 obs. of 7 variables:
$ x1: Factor w/ 4 levels "Americas","Asia_Pac",..: 4 2 4 2 4 3 3 3 4 1 ...
$ x2: Factor w/ 5 levels "Fifth","First",..: 3 3 2 2 4 2 4 4 1 1 ...
$ x3: Factor w/ 3 levels "Best","Better",..: 2 3 1 1 3 2 2 1 3 3 ...
$ x4: Factor w/ 2 levels "Female","Male": 1 1 2 1 1 2 1 2 2 2 ...
$ x5: int 82 55 47 31 6 53 77 68 76 86 ...
$ x6: num 22.8 14.6 25.5 38.3 7.9 32.8 4.6 34.2 36.7 21.7 ...
$ x7: num 0.679 0.925 0.897 0.684 0.195 ...
And the structure of my training data:
$ RecordID: int 1 2 3 4 5 6 7 8 9 10 ...
$ x1 : Factor w/ 4 levels "Americas","Asia_Pac",..: 1 2 2 3 1 1 1 2 2 4 ...
$ x2 : Factor w/ 5 levels "Fifth","First",..: 5 5 3 2 5 5 5 4 3 2 ...
$ x3 : Factor w/ 3 levels "Best","Better",..: 2 3 2 2 3 1 2 3 1 1 ...
$ x4 : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 2 2 1 1 ...
$ x5 : int 1 67 75 51 84 33 21 80 48 5 ...
$ x6 : num 21 13.8 30.3 11.9 1.7 13.2 33.9 17 3.4 19.5 ...
$ x7 : num 0.35 0.85 0.73 0.39 0.47 0.13 0.2 0.12 0.64 0.11 ...
$ Y : Factor w/ 2 levels "Green","Yellow": 2 2 1 2 2 2 1 2 2 2 ..
I think there’s a problem with the structure of the test data, but I can’t find it, or I have a mis-understanding as to the structure of the “predict” command. Note that if I run the predict command on the training data, it works. Any suggestions as to where to look?
Thanks!
predict.boosting()
expects to be given the actual labels for the test data, so it can calculate how well it did (as in the confusion matrix shown below).
library(adabag)
data(iris)
iris.adaboost <- boosting(Species~Sepal.Length+Sepal.Width+Petal.Length+
Petal.Width, data=iris, boos=TRUE, mfinal=10)
# make a 'test' dataframe without the classes, as in the question
iris2 <- iris
iris2$Species <- NULL
# replicates the error
irispred=predict.boosting(iris.adaboost, newdata=iris2)
#Error in `[.data.frame`(newdata, , as.character(object$formula[[2]])) :
# undefined columns selected
Here's working example, drawn largely from the help file just so there is a working example here (and to demonstrate the confusion matrix).
# first create subsets of iris data for training and testing
sub <- c(sample(1:50, 25), sample(51:100, 25), sample(101:150, 25))
iris3 <- iris[sub,]
iris4 <- iris[-sub,]
iris.adaboost <- boosting(Species ~ ., data=iris3, mfinal=10)
# works
iris.predboosting<- predict.boosting(iris.adaboost, newdata=iris4)
iris.predboosting$confusion
# Observed Class
#Predicted Class setosa versicolor virginica
# setosa 50 0 0
# versicolor 0 50 0
# virginica 0 0 50