How to implement the output of decision tree built using the ctree (party package)?

Kshitij Kashayp picture Kshitij Kashayp · Aug 23, 2013 · Viewed 7.1k times · Source

I have built a decision tree using the ctree function via party package. it has 1700 nodes. Firstly, is there a way in ctree to give the maxdepth argument? I tried control_ctree option but, it threw some error message saying couldnt find ctree function.

Also, how can I consume the output of this tree?. How can it be implemented for other platforms like SAS or SQL. I also have another doubt as to what does the value "* weights = 4349 " at the end of the node signify. How will I know, that which terminal node votes for which predicted value.

Answer

David Arenburg picture David Arenburg · Mar 24, 2014

There is a maxdepth option in ctree. It is located in ctree_control()

You can use it as follows

airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(maxdepth = 3))

You can also restrict the split sizes and the bucket sizes to be "no less than"

airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(minsplit= 50, minbucket = 20))

You can also to reduce increase sensetivity and lower the P-value

airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(mincriterion = 0.99))

The weights = 4349 you've mentioned is just the number of observations in that specific node. ctree has a default of giving a weight of 1 to every observation, but if you feel that you have observations that deserve bigger weights you can add a weights vector to the ctree() which have to be the same length as the data set and have to be non-negative integers. After you do that, the weights = 4349 will have to be interpreted with caution.

One way of using weights is to see which observations fell in a certain node. Using the data in the example above we can perform the following

airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq, controls = ctree_control(maxdepth = 3))
unique(where(airct)) #in order the get the terminal nodes
[1] 5 3 6 9 8

so we can check what fell in node number 5 for example

n <- nodes(airct , 5)[[1]]
x <- airq[which(as.logical(n$weights)), ]  
x
    Ozone Solar.R Wind Temp Month Day
1      41     190  7.4   67     5   1
2      36     118  8.0   72     5   2
3      12     149 12.6   74     5   3
4      18     313 11.5   62     5   4
...

Using this method you can create data sets that will contain the informationn of you terminal nodes and then import them into SAS or SQL

You can also get the list of splitting conditions using the function from my answer below ctree() - How to get the list of splitting conditions for each terminal node?