I have created a random forest
out of my data:
fit=randomForest(churn~., data=data_churn[3:17], ntree=1,
importance=TRUE, proximity=TRUE)
I can easily see my confusion matrix
:
conf <- fit$confusion
> conf
No Yes class.error
No 945 80 0.07804878
Yes 84 101 0.45405405
Now I need to know the accuracy for the random forest. I searched around and realized that caret library has a confusionMatrix
method that gets a confusion matrix and returns the accuracy (alongside with many other values). However, the method needs another parameter called "reference"
. My question is how can I provide a reference for the method to get the accuracy of my random forest?
And... is it the correct way to get the accuracy of a random forest?
Use randomForest(..., do.trace=T)
to see the OOB error during training, by both class and ntree.
(FYI you chose ntree=1
so you'll only get just one rpart decision-tree, not a forest, this kind of defeats the purpose of using RF, and of randomly choosing a subset of both features and samples. You probably want to vary ntree
values.)
And after training, you can get per-class error from the rightmost column of the confusion matrix as you already found:
> fit$confusion[, 'class.error']
class.error
No Yes
0.07804878 0.45405405
(Also you probably want to set options('digits'=3)
to not see those excessive decimal places)
As to converting that list of class errors (accuracies = 1 - errors) to one overall accuracy number, that's easy to do. You could use mean, class-weighted mean, harmonic mean (of accuracies, not of errors) etc. It depends on your application and the relative penalty for misclassifying. Your example is simple, it's only two-class.
(or e.g. there are more complicated measures of inter-rater agreement)