I would like to use rfcv to cull the unimportant variables from a data set before creating a final random forest with more trees (please correct and inform me if that's not the way to use this function). For example,
> data(fgl, package="MASS")
> tst <- rfcv(trainx = fgl[,-10], trainy = fgl[,10], scale = "log", step=0.7)
> tst$error.cv
9 6 4 3 2 1
0.2289720 0.2149533 0.2523364 0.2570093 0.3411215 0.5093458
In this case, if I understand the result correctly, it seems that we can remove three variables without negative side effects. However,
> attributes(tst)
$names
[1] "n.var" "error.cv" "predicted"
None of these slots tells me what those first three variables that can be harmlessly removed from the dataset actually were.
I think the purpose of rfcv
is to establish how your accuracy is related to the number of variables you use. This might not seem useful when you have 10 variables, but when you have thousands of variables it is quite handy to understand how much those variables "add" to the predictive power.
As you probably found out, this code
rf<-randomForest(type ~ .,data=fgl)
importance(rf)
gives you the relative importance of each of the variables.