I'm using scikit-learn's gradient-boosted trees classifier, GradientBoostingClassifier. It makes feature importance score available in feature_importances_
. How are these feature importances calculated?
I'd like to understand what algorithm scikit-learn is using, to help me understand how to interpret those numbers. The algorithm isn't listed in the documentation.
This is documented elsewhere in the scikit-learn documentation. In particular, here is how it works:
For each tree, we calculate the feature importance of a feature F as the fraction of samples that will traverse a node that splits based on feature F (see here). Then, we average those numbers across all trees (as described here).
It is not described exactly how scikit-learn estimates the fraction of nodes that will traverse a tree node that splits on feature F.
The interpretation: scores will be in the range [0,1]. Higher scores mean the feature is more important. This is an array with shape (n_features,) whose values are positive and sum to 1.0