How to use GridSearchCV output for a scikit prediction?

Question 1

How to use GridSearchCV output for a scikit prediction?

python scikit-learn grid-search

user308827 · Feb 14, 2016 · Viewed 10.8k times · Source

Answer

Answer

gs.predict(X_test) is equivalent to gs.best_estimator_.predict(X_test). Using either, X_test will be passed through your entire pipeline and it will return the predictions.

gs.best_estimator_.named_steps['clf'].predict(), however is only the last phase of the pipeline. To use it, the feature selection step must already have been performed. This would only work if you have previously run your data through gs.best_estimator_.named_steps['fs'].transform()

Three equivalent methods for generating predictions are shown below:

Using gs directly.

pred = gs.predict(X_test)

Using best_estimator_.

pred = gs.best_estimator_.predict(X_test)

Calling each step in the pipeline individual.

X_test_fs = gs.best_estimator_.named_steps['fs'].transform(X_test)
pred = gs.best_estimator_.named_steps['clf'].predict(X_test_fs)

Question 2

In the following code:

# Load dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

rf_feature_imp = RandomForestClassifier(100)
feat_selection = SelectFromModel(rf_feature_imp, threshold=0.5)

clf = RandomForestClassifier(5000)

model = Pipeline([
          ('fs', feat_selection), 
          ('clf', clf), 
        ])

 params = {
    'fs__threshold': [0.5, 0.3, 0.7],
    'fs__estimator__max_features': ['auto', 'sqrt', 'log2'],
    'clf__max_features': ['auto', 'sqrt', 'log2'],
 }

 gs = GridSearchCV(model, params, ...)
 gs.fit(X,y)

What should be used for a prediction?

gs?
gs.best_estimator_? or
gs.best_estimator_.named_steps['clf']?

What is the difference between these 3?

How to use GridSearchCV output for a scikit prediction?

Answer

Related questions