How to get feature importance in xgboost?

modkzs picture modkzs · Jun 4, 2016 · Viewed 78.6k times · Source

I'm using xgboost to build a model, and try to find the importance of each feature using get_fscore(), but it returns {}

and my train code is:

dtrain = xgb.DMatrix(X, label=Y)
watchlist = [(dtrain, 'train')]
param = {'max_depth': 6, 'learning_rate': 0.03}
num_round = 200
bst = xgb.train(param, dtrain, num_round, watchlist)

So is there any mistake in my train? How to get feature importance in xgboost?

Answer

MLKing picture MLKing · Aug 2, 2018

In your code you can get feature importance for each feature in dict form:

bst.get_score(importance_type='gain')

>>{'ftr_col1': 77.21064539577829,
   'ftr_col2': 10.28690566363971,
   'ftr_col3': 24.225014841466294,
   'ftr_col4': 11.234086283060112}

Explanation: The train() API's method get_score() is defined as:

get_score(fmap='', importance_type='weight')

  • fmap (str (optional)) – The name of feature map file.
  • importance_type
    • ‘weight’ - the number of times a feature is used to split the data across all trees.
    • ‘gain’ - the average gain across all splits the feature is used in.
    • ‘cover’ - the average coverage across all splits the feature is used in.
    • ‘total_gain’ - the total gain across all splits the feature is used in.
    • ‘total_cover’ - the total coverage across all splits the feature is used in.

https://xgboost.readthedocs.io/en/latest/python/python_api.html