Decision tree using continuous variable

BSKim picture BSKim · Nov 30, 2016 · Viewed 11.3k times · Source

I have a question about Decision tree using continuous variable

I heard that when output variable is continuous and input variable is categorical, split criteria is reducing variance or something. but I don't know how it work if input variable is continuous

1) input variable : continuous / output variable : categorical

2) input variable : continuous / output variable : continuous

About two cases, how we can get a split criteria like gini index or information gain?

When I use rpart in R, whatever input variable and output variable are, it works well but I can't know an algorithm in detail

Answer

Vito picture Vito · Nov 30, 2016

1) input variable : continuous / output variable : categorical
C4.5 algorithm solve this situation. C4.5

In order to handle continuous attributes, C4.5 creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it.

2) input variable : continuous / output variable : continuous
CART(classification and regression trees) algorithm solves this situation. CART

Case 2 is the regression problem. You should enumerate the attribute j, and enumerate the values s in that attribute, and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it. Then you get two areas enter image description here

Find the best attribute j and the best split value s, which

enter image description here

c_1 and c_2 and be solved as follows:

enter image description here

Then when do regression,
enter image description here

where

enter image description here