I was told to use the caret package in order to perform Support Vector Machine regression with 10 fold cross validation on a data set I have. I'm plotting my response variable against 151 variables. I did the following:-
> ctrl <- trainControl(method = "repeatedcv", repeats = 10)
> set.seed(1500)
> mod <- train(RT..seconds.~., data=cadets, method = "svmLinear", trControl = ctrl)
in which I got
C RMSE Rsquared RMSE SD Rsquared SD
0.2 50 0.8 20 0.1
0.5 60 0.7 20 0.2
1 60 0.7 20 0.2
But I want to be able to have a look at my folds, and for each of them how close the predicted values were to the actual values. How do I go about looking at this?
Also, it says that:-
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was C = 0.
I was just wondering what this meant and what the C stands for in the table above?
RT (seconds) 76_TI2 114_DECC 120_Lop 212_PCD 236_X3Av
38 4.086 1.2 2.322 0 0.195
40 2.732 0.815 1.837 1.113 0.13
41 4.049 1.153 2.117 2.354 0.094
41 4.049 1.153 2.117 3.838 0.117
42 4.56 1.224 2.128 2.38 0.246
42 2.96 0.909 1.686 0.972 0.138
42 3.237 0.96 1.922 1.202 0.143
44 2.989 0.8 1.761 2.034 0.11
44 1.993 0.5 1.5 0 0.102
44 2.957 0.8 1.761 0.988 0.141
44 2.597 0.889 1.888 1.916 0.114
44 2.428 0.691 1.436 1.848 0.089
This is a snipet of my dataset. I'm trying to pot RT seconds against 151 variables.
Thanks
You have to save your CV predictions via the "savePred" option in your trainControl
object. I'm not sure what package your "cadets" data is from, but here is a trivial example using iris:
> library(caret)
> ctrl <- trainControl(method = "cv", savePred=T, classProb=T)
> mod <- train(Species~., data=iris, method = "svmLinear", trControl = ctrl)
> head(mod$pred)
pred obs setosa versicolor virginica rowIndex .C Resample
1 setosa setosa 0.982533940 0.009013592 0.008452468 11 0.25 Fold01
2 setosa setosa 0.955755054 0.032289120 0.011955826 35 0.25 Fold01
3 setosa setosa 0.941292675 0.044903583 0.013803742 46 0.25 Fold01
4 setosa setosa 0.983559919 0.008310323 0.008129757 49 0.25 Fold01
5 setosa setosa 0.972285699 0.018109218 0.009605083 50 0.25 Fold01
6 versicolor versicolor 0.007223973 0.971168170 0.021607858 59 0.25 Fold01
EDIT: The "C" is one of tuning parameters for your SVM. Check out the help for the ksvm
function in the kernlab package for more details.
EDIT2: Trivial regression example
> library(caret)
> ctrl <- trainControl(method = "cv", savePred=T)
> mod <- train(Sepal.Length~., data=iris, method = "svmLinear", trControl = ctrl)
> head(mod$pred)
pred obs rowIndex .C Resample
1 4.756119 4.8 13 0.25 Fold01
2 4.910948 4.8 31 0.25 Fold01
3 5.094275 4.9 38 0.25 Fold01
4 4.728503 4.8 46 0.25 Fold01
5 5.192965 5.3 49 0.25 Fold01
6 5.969479 5.9 62 0.25 Fold01