How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation with 10 folds in python

Question 1

How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation with 10 folds in python

python scikit-learn random-forest cross-validation supervised-learning

Jayashree · Oct 6, 2017 · Viewed 13k times · Source

Answer

Answer

When you use cross_val_score method, you can specify, which scorings you can calculate on each fold:

from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

scoring = {'accuracy' : make_scorer(accuracy_score), 
           'precision' : make_scorer(precision_score),
           'recall' : make_scorer(recall_score), 
           'f1_score' : make_scorer(f1_score)}

kfold = model_selection.KFold(n_splits=10, random_state=42)
model=RandomForestClassifier(n_estimators=50) 

results = model_selection.cross_val_score(estimator=model,
                                          X=features,
                                          y=labels,
                                          cv=kfold,
                                          scoring=scoring)

After cross validation, you will get results dictionary with keys: 'accuracy', 'precision', 'recall', 'f1_score', which store metrics values on each fold for certain metric. For each metric you can calculate mean and std value by using np.mean(results[value]) and np.std(results[value]), where value - one of your specified metric name.

Question 2

I have an imbalanced dataset containing binary classification problem.I have built Random Forest Classifier and used k fold cross validation with 10 folds.

kfold = model_selection.KFold(n_splits=10, random_state=42)
model=RandomForestClassifier(n_estimators=50)

I got the results of the 10 folds

results = model_selection.cross_val_score(model,features,labels, cv=kfold)
print results
[ 0.60666667  0.60333333  0.52333333  0.73        0.75333333  0.72        0.7
  0.73        0.83666667  0.88666667]

I have calculated accuracy by taking mean and standard deviation of the results

print("Accuracy: %.3f%% (%.3f%%)") % (results.mean()*100.0, results.std()*100.0)
Accuracy: 70.900% (10.345%)

I have computed my predictions as follows

predictions = cross_val_predict(model, features,labels ,cv=10)

Since this is an imbalanced dataset,I would like to calculate precision,recall and f1 score of each fold and average the results. How to calculate the values in python?

How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation with 10 folds in python

Answer

Related questions