roc curve with sklearn [python]

petbottle picture petbottle · Jan 2, 2016 · Viewed 23.9k times · Source

I have an understanding problem by using the roc libraries.

I want to plot a roc curve with a python http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html

I am writing a program which evalutes detectors (haarcascade, neuronal networks) and want to evaluate them. So I already have the data saved in a file in the following format:

 0.5 TP
 0.43 FP
 0.72 FN
 0.82 TN 
 ...

whereas TP means True Positive, FP - False Positivve, FN - False Negative, TN - True Negative

I parse it and fill 4 arrays with this data set.

Then I want to put this in

   fpr, tpr = sklearn.metrics.roc_curve(y_true, y_score, average='macro', sample_weight=None)

but how to do this? What is y_true in my case and y_score? afterwards, I put it fpr, tpr in

auc = sklearn.metric.auc(fpr, tpr)

Answer

H. Cavalera picture H. Cavalera · Jan 3, 2016

Quotting Wikipedia:

The ROC is created by plotting the FPR (false positive rate) vs the TPR (true positive rate) at various thresholds settings.

In order to compute FPR and TPR, you must provide the true binary value and the target scores to the function sklearn.metrics.roc_curve.

So in your case, I would do something like this :

from sklearn.metrics import roc_curve
from sklearn.metrics import auc

# Compute fpr, tpr, thresholds and roc auc
fpr, tpr, thresholds = roc_curve(y_true, y_score)
roc_auc = auc(y_true, y_score)

# Plot ROC curve
plt.plot(fpr, tpr, label='ROC curve (area = %0.3f)' % roc_auc)
plt.plot([0, 1], [0, 1], 'k--')  # random predictions curve
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate or (1 - Specifity)')
plt.ylabel('True Positive Rate or (Sensitivity)')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")

If you want to have a deeper understanding of how the False positive rate and the True positive rate are computed for all the possible thresholds values, I suggest you to read this article