I have a data set with 14 features and few of them are as below, where sex and marital status are categorical variables.
height,sex,maritalStatus,age,edu,homeType
SEX
1. Male
2. Female
MARITAL STATUS
1. Married
2. Living together, not married
3. Divorced or separated
4. Widowed
5. Single, never married
Now I am using rpart library from R to build a classification tree using the following
rfit = rpart(homeType ~., data = trainingData, method = "class", cp = 0.0001)
This gives me a decision tree that does not consider sex and marital status as factors.
I am thinking of using as.factor for this :
sex = as.factor(trainingData$sex)
ms = as.factor(trainingData$maritalStatus)
But I am not sure how do i pass this information to rpart. Since the data argument in rpart() takes in "trainingData" data frame. It will always take the values that are in this data frame. I am little new to R and would appreciate someone's help on this.
You could make the changes to the trainingData
data frame directly, then run rpart()
.
trainingData$sex = as.factor(trainingData$sex)
trainingData$maritalStatus = as.factor(trainingData$maritalStatus)
rfit = rpart(homeType ~., data = trainingData, method = "class", cp = 0.0001)