softmax and sigmoid function for the output layer

user288609 picture user288609 · Dec 31, 2016 · Viewed 15.8k times · Source

In the deep learning implementations related to object detection and semantic segmentation, I have seen the output layers using either sigmoid or softmax. I am not very clear when to use which? It seems to me both of them can support these tasks. Are there any guidelines for this choice?

Answer

martianwars picture martianwars · Dec 31, 2016

softmax() helps when you want a probability distribution, which sums up to 1. sigmoid is used when you want the output to be ranging from 0 to 1, but need not sum to 1.

In your case, you wish to classify and choose between two alternatives. I would recommend using softmax() as you will get a probability distribution which you can apply cross entropy loss function on.