How is the categorical_crossentropy implemented in keras?

Eric picture Eric · May 29, 2017 · Viewed 14.5k times · Source

I'm trying to apply the concept of distillation, basically to train a new smaller network to do the same as the original one but with less computation.

I have the softmax outputs for every sample instead of the logits.

My question is, how is the categorical cross entropy loss function implemented? Like it takes the maximum value of the original labels and multiply it with the corresponded predicted value in the same index, or it does the summation all over the logits (One Hot encoding) as the formula says:

enter image description here

Answer

dat09 picture dat09 · Mar 12, 2019

As an answer to "Do you happen to know what the epsilon and tf.clip_by_value is doing?",
it is ensuring that output != 0, because tf.log(0) returns a division by zero error.
(I don't have points to comment but thought I'd contribute)