softmax and sigmoid function for the output layer

tensorflow computer-vision deep-learning theano keras

user288609 · Dec 31, 2016 · Viewed 15.8k times · Source

In the deep learning implementations related to object detection and semantic segmentation, I have seen the output layers using either sigmoid or softmax. I am not very clear when to use which? It seems to me both of them can support these tasks. Are there any guidelines for this choice?

Answer

softmax() helps when you want a probability distribution, which sums up to 1. sigmoid is used when you want the output to be ranging from 0 to 1, but need not sum to 1.

In your case, you wish to classify and choose between two alternatives. I would recommend using softmax() as you will get a probability distribution which you can apply cross entropy loss function on.

softmax and sigmoid function for the output layer

Answer

Related questions