My response is a categorical variable (some alphabets), so I used distribution='multinomial' when making the model, and now I want to predict the response and obtain the output in terms of these alphabets, instead of matrix of probabilities.
However in predict(model, newdata, type='response')
, it gives probabilities, same as the result of type='link'
.
Is there a way to obtain categorical outputs?
BST = gbm(V1~.,data=training,distribution='multinomial',n.trees=2000,interaction.depth=4,cv.folds=5,shrinkage=0.005)
predBST = predict(BST,newdata=test,type='response')
In predict.gbm
documentation, it is mentioned:
If type="response" then gbm converts back to the same scale as the outcome. Currently the only effect this will have is returning probabilities for bernoulli and expected counts for poisson. For the other distributions "response" and "link" return the same.
What you should do, as Dominic suggests, is to pick the response with the highest probability from the resulting predBST
matrix, by doing apply(.., 1, which.max)
on the vector output from prediction.
Here is a code sample with the iris
dataset:
library(gbm)
data(iris)
df <- iris[,-c(1)] # remove index
df <- df[sample(nrow(df)),] # shuffle
df.train <- df[1:100,]
df.test <- df[101:150,]
BST = gbm(Species~.,data=df.train,
distribution='multinomial',
n.trees=200,
interaction.depth=4,
#cv.folds=5,
shrinkage=0.005)
predBST = predict(BST,n.trees=200, newdata=df.test,type='response')
p.predBST <- apply(predBST, 1, which.max)
> predBST[1:6,,]
setosa versicolor virginica
[1,] 0.89010862 0.05501921 0.05487217
[2,] 0.09370400 0.45616148 0.45013452
[3,] 0.05476228 0.05968445 0.88555327
[4,] 0.05452803 0.06006513 0.88540684
[5,] 0.05393377 0.06735331 0.87871292
[6,] 0.05416855 0.06548646 0.88034499
> head(p.predBST)
[1] 1 2 3 3 3 3