I have a question about Decision tree using continuous variable
I heard that when output variable is continuous and input variable is categorical, split criteria is reducing variance or something. but I don't know how it work if input variable is continuous
1) input variable : continuous / output variable : categorical
2) input variable : continuous / output variable : continuous
About two cases, how we can get a split criteria like gini index or information gain?
When I use rpart in R, whatever input variable and output variable are, it works well but I can't know an algorithm in detail
1) input variable : continuous / output variable : categorical
C4.5 algorithm solve this situation.
C4.5
In order to handle continuous attributes, C4.5 creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it.
2) input variable : continuous / output variable : continuous
CART(classification and regression trees) algorithm solves this situation. CART
Case 2 is the regression problem. You should enumerate the attribute j
, and enumerate the values s
in that attribute, and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it. Then you get two areas
Find the best attribute j
and the best split value s
, which
c_1
and c_2
and be solved as follows:
where