Getting features in RFECV scikit-learn

Question 1

Getting features in RFECV scikit-learn

python scikit-learn cross-validation rfe

Javiss · May 17, 2018 · Viewed 7.7k times · Source

Answer

Answer

First, you can see which features it selected where the cross validation score is the largest (in your case this corresponds to the number of features 17 or 21, I am not sure from the figure) with

rfecv.support_

or

rfecv.ranking_

Then you can calculate the importances of selected features (for the peak of the cv score curve) by

np.absolute(rfecv.estimator_.coef_)

for simple estimators or

rfecv.estimator_.feature_importances_

if your estimator is some ensemble, like random forest.

Then you can remove the least important feature one by one in the loop, and recalculate rfecv for the remaining feature sets.

Question 2

Inspired by this: http://scikit-learn.org/stable/auto_examples/feature_selection/plot_rfe_with_cross_validation.html#sphx-glr-auto-examples-feature-selection-plot-rfe-with-cross-validation-py

I am wondering if there is anyway to get the features for a particular score:

In that case, I would like to know, which 10 features selected gives that peak when #Features = 10.

Any ideas?

EDIT:

This is the code used to get that plot:

from sklearn.feature_selection import RFECV
from sklearn.model_selection import KFold,StratifiedKFold #for K-fold cross validation
from sklearn.ensemble import RandomForestClassifier #Random Forest

# The "accuracy" scoring is proportional to the number of correct classifications
#kfold = StratifiedKFold(n_splits=10, random_state=1) # k=10, split the data into 10 equal parts
model_Linear_SVM=svm.SVC(kernel='linear', probability=True)
rfecv = RFECV(estimator=model_Linear_SVM, step=1, cv=kfold,scoring='accuracy')   #5-fold cross-validation
rfecv = rfecv.fit(X, y)

print('Optimal number of features :', rfecv.n_features_)
print('Best features :', X.columns[rfecv.support_])
print('Original features :', X.columns)
plt.figure()
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score \n of number of selected features")
plt.plot(range(1, len(rfecv.grid_scores_) + 1), rfecv.grid_scores_)
plt.show()

Getting features in RFECV scikit-learn

Answer

Related questions