In R, what is the functionality of probability=TRUE
in the svm
function of the e1071
package?
model <- svm (Type ~ ., data, probability=TRUE, cost = 100, gamma = 1)
Setting the probability
argument to TRUE
for both model fitting and prediction returns, for each prediction, the vector of probabilities of belonging to each class of the response variable. These are stored in a matrix, as an attribute of the prediction object.
For example:
library(e1071)
model <- svm(Species ~ ., data = iris, probability=TRUE)
# (below I'm just predicting to the training dataset - it could of course just
# as easily be a separate test dataset)
pred <- predict(model, iris, probability=TRUE)
head(attr(pred, "probabilities"))
# setosa versicolor virginica
# 1 0.9803339 0.01129740 0.008368729
# 2 0.9729193 0.01807053 0.009010195
# 3 0.9790435 0.01192820 0.009028276
# 4 0.9750030 0.01531171 0.009685342
# 5 0.9795183 0.01164689 0.008834838
# 6 0.9740730 0.01679643 0.009130620
Note, however, that it's important to set probability=TRUE
for the call to svm
, and not just the call to predict
, since the latter alone would produce:
# setosa versicolor virginica
# 1 0.3333333 0.3333333 0.3333333
# 2 0.3333333 0.3333333 0.3333333
# 3 0.3333333 0.3333333 0.3333333
# 4 0.3333333 0.3333333 0.3333333
# 5 0.3333333 0.3333333 0.3333333
# 6 0.3333333 0.3333333 0.3333333