Why is the logloss negative?

toom picture toom · Oct 9, 2014 · Viewed 12.1k times · Source

I just applied the log loss in sklearn for logistic regression: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html

My code looks something like this:

def perform_cv(clf, X, Y, scoring):
    kf = KFold(X.shape[0], n_folds=5, shuffle=True)
    kf_scores = []
    for train, _ in kf:
        X_sub = X[train,:]
        Y_sub = Y[train]
        #Apply 'log_loss' as a loss function
        scores = cross_validation.cross_val_score(clf, X_sub, Y_sub, cv=5, scoring='log_loss')
        kf_scores.append(scores.mean())
    return kf_scores

However, I'm wondering why the resulting logarithmic losses are negative. I'd expect them to be positive since in the documentation (see my link above) the log loss is multiplied by a -1 in order to turn it into a positive number.

Am I doing something wrong here?

Answer

AN6U5 picture AN6U5 · Dec 8, 2014

Yes, this is supposed to happen. It is not a 'bug' as others have suggested. The actual log loss is simply the positive version of the number you're getting.

SK-Learn's unified scoring API always maximizes the score, so scores which need to be minimized are negated in order for the unified scoring API to work correctly. The score that is returned is therefore negated when it is a score that should be minimized and left positive if it is a score that should be maximized.

This is also described in sklearn GridSearchCV with Pipeline and in scikit-learn cross validation, negative values with mean squared error