Classification report with Nested Cross Validation in SKlearn (Average/Individual values)

utengr picture utengr · Mar 2, 2017 · Viewed 8k times · Source

Is it possible to get classification report from cross_val_score through some workaround? I'm using nested cross-validation and I can get various scores here for a model, however, I would like to see the classification report of the outer loop. Any recommendations?

# Choose cross-validation techniques for the inner and outer loops,
# independently of the dataset.
# E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc.
inner_cv = KFold(n_splits=4, shuffle=True, random_state=i)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=i)

# Non_nested parameter search and scoring
clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)

# Nested CV with parameter optimization
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)

I would like to see a classification report here along side the score values. http://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html

Answer

utengr picture utengr · Mar 2, 2017

Its just an addition to Sandipan's answer as I couldn't edit it. If we want to calculate the average classification report for a complete run of the cross-validation instead of individual folds, we can use the following code:

# Variables for average classification report
originalclass = []
predictedclass = []

#Make our customer score
def classification_report_with_accuracy_score(y_true, y_pred):
    originalclass.extend(y_true)
    predictedclass.extend(y_pred)
    return accuracy_score(y_true, y_pred) # return accuracy score

inner_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=i)
outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=i)

# Non_nested parameter search and scoring
clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)

# Nested CV with parameter optimization
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv, scoring=make_scorer(classification_report_with_accuracy_score))

# Average values in classification report for all folds in a K-fold Cross-validation  
print(classification_report(originalclass, predictedclass)) 

Now the result for the example in Sandipan's answer would look like this:

            precision    recall  f1-score   support

          0       1.00      1.00      1.00        50
          1       0.96      0.94      0.95        50
          2       0.94      0.96      0.95        50

avg / total       0.97      0.97      0.97       150