How do I interpret rpart splits on factor variables when building classification trees in R?

user281537 picture user281537 · Apr 8, 2010 · Viewed 7.7k times · Source

If the factor variable is Climate, with 4 possible values: Tropical, Arid, Temperate, Snow, and a node in my rpart tree is labeled as "Climate:ab", what is the split?

Answer

Marek picture Marek · Oct 1, 2010

I assume you use standard way to plot tree which is

plot(f)
text(f)

As you can read in help to text.rpart, argument pretty on default factor variables are presented as letters, so a means levels(Climate)[1] and it means that on left node are observation with Climate==levels(Climate)[1] and on right the others.

You could print levels directly using

plot(f)
text(f, pretty=1)

Created by rpart

but I recommend using draw.tree from maptree package:

require(maptree)
draw.tree(f)

Created by maptree

I used fake data to do plots:

X <- data.frame(
    y=rep(1:4,25),
    Climate=rep(c("Tropical", "Arid", "Temperate", "Snow"),25)
)
f <- rpart(y~Climate, X)