In the deep learning implementations related to object detection and semantic segmentation, I have seen the output layers using either sigmoid or softmax. I am not very clear when to use which? It seems to me both of them can support these tasks. Are there any guidelines for this choice?
softmax()
helps when you want a probability distribution, which sums up to 1. sigmoid
is used when you want the output to be ranging from 0 to 1, but need not sum to 1.
In your case, you wish to classify and choose between two alternatives. I would recommend using softmax()
as you will get a probability distribution which you can apply cross entropy loss function on.