I understand that the common practice to select CP value is by choosing the lowest level with the minimum xerror
value. However, in my following case, using cp <- fit$cptable[which.min(fit$cptable[,"xerror"]),"CP"]
will give me 0.17647059
which will result in no split or just root after pruning with this value.
> myFormula <- Kyphosis~Age+Number+Start
> set.seed(1)
> fit <- rpart(myFormula,data=data,method="class",control=rpart.control(minsplit=20,xval=10,cp=0.01))
> fit$cptable
CP nsplit rel error xerror xstd
1 0.17647059 0 1.0000000 1.000000 0.2155872
2 0.01960784 1 0.8235294 1.000000 0.2155872
3 0.01000000 4 0.7647059 1.058824 0.2200975
Is there any other alternative/ good practice to select the CP value?
Generally, a cptable like the one you have, is a warning that the tree is probably no use at all and probably not able to generalise well on to future data. So the answer is not to find another way to choose cp but rather to create a useful tree if you can, or to admit defeat and say that based on the examples and features that we have, we cannot create a model that is predictive of kyphosis.
In your case, all is not - necessarily - lost. The data is very small and the cross validation which gives rise to the xerror column is very volatile. If you seed your seed to 2 or to 3 you will see very different answers in that column (some even worse).
So one thing which is interesting on this data, is to increase the number of cross-validation folds to the number of observations (so that you get LOOCV). If you do this:
myFormula <- Kyphosis ~ Age + Number + Start
rpart_1 <- rpart(myFormula, data = kyphosis,
method = "class",
control = rpart.control(minsplit = 20, xval = 81, cp = 0.01))
rpart_1$cptable
you will find a CP table that you will like better! (Note that setting a seed is not necessary any more since the folds are the same each time).