i have some data and Y variable is a factor - Good or Bad. I am building a Support vector machine using 'train' method from 'caret' package. Using 'train' function i was able to finalize values of various tuning parameters and got the final Support vector machine . For the test data i can predict the 'class'. But when i try to predict probabilities for test data, i get below error (for example my model tells me that 1st data point in test data has y='good', but i want to know what is the probability of getting 'good' ...generally in case of support vector machine, model will calculate probability of prediction..if Y variable has 2 outcomes then model will predict probability of each outcome. The outcome which has the maximum probability is considered as the final solution)
**Warning message:
In probFunction(method, modelFit, ppUnk) :
kernlab class probability calculations failed; returning NAs**
sample code as below
library(caret)
trainset <- data.frame(
class=factor(c("Good", "Bad", "Good", "Good", "Bad", "Good", "Good", "Good", "Good", "Bad", "Bad", "Bad")),
age=c(67, 22, 49, 45, 53, 35, 53, 35, 61, 28, 25, 24))
testset <- data.frame(
class=factor(c("Good", "Bad", "Good" )),
age=c(64, 23, 50))
library(kernlab)
set.seed(231)
### finding optimal value of a tuning parameter
sigDist <- sigest(class ~ ., data = trainset, frac = 1)
### creating a grid of two tuning parameters, .sigma comes from the earlier line. we are trying to find best value of .C
svmTuneGrid <- data.frame(.sigma = sigDist[1], .C = 2^(-2:7))
set.seed(1056)
svmFit <- train(class ~ .,
data = trainset,
method = "svmRadial",
preProc = c("center", "scale"),
tuneGrid = svmTuneGrid,
trControl = trainControl(method = "repeatedcv", repeats = 5))
### svmFit finds the optimal values of tuning parameters and builds the model using the best parameters
### to predict class of test data
predictedClasses <- predict(svmFit, testset )
str(predictedClasses)
### predict probablities but i get an error
predictedProbs <- predict(svmFit, newdata = testset , type = "prob")
head(predictedProbs)
new question below this line: as per below output there are 9 support vectors. how to recognize out of 12 training data points which are those 9?
svmFit$finalModel
Support Vector Machine object of class "ksvm"
SV type: C-svc (classification) parameter : cost C = 1
Gaussian Radial Basis kernel function. Hyperparameter : sigma = 0.72640759446315
Number of Support Vectors : 9
Objective Function Value : -5.6994 Training error : 0.083333
In the train control statement, you have to specify if you want the class probabilities classProbs = TRUE
returned.
svmFit <- train(class ~ .,
data = trainset,
method = "svmRadial",
preProc = c("center", "scale"),
tuneGrid = svmTuneGrid,
trControl = trainControl(method = "repeatedcv", repeats = 5,
classProbs = TRUE))
predictedClasses <- predict(svmFit, testset )
predictedProbs <- predict(svmFit, newdata = testset , type = "prob")
giving the probabilities of being in the Bad or Good class in the test dataset as:
print(predictedProbs)
Bad Good
1 0.2302979 0.7697021
2 0.7135050 0.2864950
3 0.2230889 0.7769111
To answer your new question, you can access the position of the support vectors in your original data set with alphaindex(svmFit$finalModel)
with coefficients coef(svmFit$finalModel)
.