What's the best way to handle missing feature attribute values with Weka's C4.5 (J48) decision tree? The problem of missing values occurs during both training and classification.
If values are missing from training instances, am I correct in assuming that I place a '?' value for the feature?
Suppose that I am able to successfully build the decision tree and then create my own tree code in C++ or Java from Weka's tree structure. During classification time, if I am trying to classify a new instance, what value do I put for features that have missing values? How would I descend the tree past a decision node for which I have an unknown value?
Would using Naive Bayes be better for handling missing values? I would just assign a very small non-zero probability for them, right?
From Pedro Domingos' ML course in University of Washington:
Here are three approaches what Pedro suggests for missing value of A
:
A
among other examples sorted to node n
A
among other examples with same target valuep_i
to each possible value v_i
of A
; Assign fraction p_i
of example to each descendant in tree.The slides and video is now viewable at here.