building classification tree having categorical variables using rpart

user4251309 picture user4251309 · Nov 14, 2014 · Viewed 24.8k times · Source

I have a data set with 14 features and few of them are as below, where sex and marital status are categorical variables.

height,sex,maritalStatus,age,edu,homeType

SEX
         1. Male
         2. Female

MARITAL STATUS
         1. Married
         2. Living together, not married
         3. Divorced or separated
         4. Widowed
         5. Single, never married

Now I am using rpart library from R to build a classification tree using the following

rfit = rpart(homeType ~., data = trainingData, method = "class", cp = 0.0001)

This gives me a decision tree that does not consider sex and marital status as factors.

I am thinking of using as.factor for this :

sex = as.factor(trainingData$sex)
ms = as.factor(trainingData$maritalStatus)

But I am not sure how do i pass this information to rpart. Since the data argument in rpart() takes in "trainingData" data frame. It will always take the values that are in this data frame. I am little new to R and would appreciate someone's help on this.

Answer

Jean V. Adams picture Jean V. Adams · Nov 14, 2014

You could make the changes to the trainingData data frame directly, then run rpart().

trainingData$sex = as.factor(trainingData$sex)
trainingData$maritalStatus = as.factor(trainingData$maritalStatus)
rfit = rpart(homeType ~., data = trainingData, method = "class", cp = 0.0001)