Accuracy, precision, recall and f-score are measures of a system quality in machine-learning systems. It depends on a confusion matrix of True/False Positives/Negatives.
Given a binary classification task, I have tried the following to get a function that returns accuracy, precision, recall and f-score:
gold = [1] + [0] * 9
predicted = [1] * 10
def evaluation(gold, predicted):
true_pos = sum(1 for p,g in zip(predicted, gold) if p==1 and g==1)
true_neg = sum(1 for p,g in zip(predicted, gold) if p==0 and g==0)
false_pos = sum(1 for p,g in zip(predicted, gold) if p==1 and g==0)
false_neg = sum(1 for p,g in zip(predicted, gold) if p==0 and g==1)
try:
recall = true_pos / float(true_pos + false_neg)
except:
recall = 0
try:
precision = true_pos / float(true_pos + false_pos)
except:
precision = 0
try:
fscore = 2*precision*recall / (precision + recall)
except:
fscore = 0
try:
accuracy = (true_pos + true_neg) / float(len(gold))
except:
accuracy = 0
return accuracy, precision, recall, fscore
But it seems like I have redundantly looped through the dataset 4 times to get the True/False Positives/Negatives.
Also the multiple try-excepts
to catch the ZeroDivisionError
is a little redundant.
So what is the pythonic way to get the counts of the True/False Positives/Negatives without multiple loops through the dataset?
How do I pythonically catch the ZeroDivisionError
without the multiple try-excepts?
I could also do the following to count the True/False Positives/Negatives in one loop but is there an alternative way without the multiple if
?:
for p,g in zip(predicted, gold):
if p==1 and g==1:
true_pos+=1
if p==0 and g==0:
true_neg+=1
if p==1 and g==0:
false_pos+=1
if p==0 and g==1:
false_neg+=1
what is the pythonic way to get the counts of the True/False Positives/Negatives without multiple loops through the dataset?
I would use a collections.Counter
, roughly what you're doing with all of the if
s (you should be using elif
s, as your conditions are mutually exclusive) at the end:
counts = Counter(zip(predicted, gold))
Then e.g. true_pos = counts[1, 1]
.
How do I pythonically catch the ZeroDivisionError without the multiple try-excepts?
For a start, you should (almost) never use a bare except:
. If you're catching ZeroDivisionError
s, then write except ZeroDivisionError
. You could also consider a "look before you leap" approach, checking whether the denominator is 0
before trying the division, e.g.
accuracy = (true_pos + true_neg) / float(len(gold)) if gold else 0