Keras: weighted binary crossentropy

Kevin Meier picture Kevin Meier · Sep 2, 2017 · Viewed 29.1k times · Source

I tried to implement a weighted binary crossentropy with Keras, but I am not sure if the code is correct. The training output seems to be a bit confusing. After a few epochs I just get an accuracy of ~0.15. I think thats much too less (even for a random guess).

There are in general about 11% ones in the output and 89% zeros, therefore the weights are w_zero=0.89 and w_one=0.11.

My code:

def create_weighted_binary_crossentropy(zero_weight, one_weight):

    def weighted_binary_crossentropy(y_true, y_pred):

        # Original binary crossentropy (see losses.py):
        # K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)

        # Calculate the binary crossentropy
        b_ce = K.binary_crossentropy(y_true, y_pred)

        # Apply the weights
        weight_vector = y_true * one_weight + (1. - y_true) * zero_weight
        weighted_b_ce = weight_vector * b_ce

        # Return the mean error
        return K.mean(weighted_b_ce)

    return weighted_binary_crossentropy

Maybe someone sees whats wrong?

Thank you

Answer

Yu-Yang picture Yu-Yang · Sep 2, 2017

Normally, the minority class will have a higher class weight. It'll be better to use one_weight=0.89, zero_weight=0.11 (btw, you can use class_weight={0: 0.11, 1: 0.89}, as suggested in the comment).

Under class imbalance, your model is seeing much more zeros than ones. It will also learn to predict more zeros than ones because the training loss can be minimized by doing so. That's also why you're seeing an accuracy close to the proportion 0.11. If you take an average over model predictions, it should be very close to zero.

The purpose of using class weights is to change the loss function so that the training loss cannot be minimized by the "easy solution" (i.e., predicting zeros), and that's why it'll be better to use a higher weight for ones.

Note that the best weights are not necessarily 0.89 and 0.11. Sometimes you might have to try something like taking logarithms or square roots (or any weights satisfying one_weight > zero_weight) to make it work.