Usage of sigmoid activation function in Keras

Ahmad Hijazi picture Ahmad Hijazi · Nov 30, 2018 · Viewed 12.2k times · Source

I have a big dataset composed of 18260 input field with 4 outputs. I am using Keras and Tensorflow to build a neural network that can detect the possible output.

However I tried many solutions but the accuracy is not getting above 55% unless I use sigmoid activation function in all model layers except the first one as below:

def baseline_model(optimizer= 'adam' , init= 'random_uniform'):
# create model
model = Sequential()
model.add(Dense(40, input_dim=18260, activation="relu", kernel_initializer=init))
model.add(Dense(40, activation="sigmoid", kernel_initializer=init))
model.add(Dense(40, activation="sigmoid", kernel_initializer=init))
model.add(Dense(10, activation="sigmoid", kernel_initializer=init))
model.add(Dense(4, activation="sigmoid", kernel_initializer=init))
model.summary()
# Compile model
model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
return model

Is using sigmoid for activation correct in all layers? The accuracy is reaching 99.9% when using sigmoid as shown above. So I was wondering if there is something wrong in the model implementation.

Answer

Mitiku picture Mitiku · Nov 30, 2018

The sigmoid might work. But I suggest using relu activation for hidden layers' activation. The problem is, your output layer's activation is sigmoid but it should be softmax(because you are using sparse_categorical_crossentropy loss).

model.add(Dense(4, activation="softmax", kernel_initializer=init))

Edit after discussion on comments

Your outputs are integers for class labels. Sigmoid logistic function outputs values in range (0,1). The output of the softmax is also in range (0,1), but the Softmax function adds another constraint on the outputs:- the sum of the outputs must be 1. Therefore the outputs of softmax can be interpreted as probability of the input being each class.

E.g


def sigmoid(x): 
    return 1.0/(1 + np.exp(-x))

def softmax(a): 
    return np.exp(a-max(a))/np.sum(np.exp(a-max(a))) 

a = np.array([0.6, 10, -5, 4, 7])
print(sigmoid(a))
# [0.64565631, 0.9999546 , 0.00669285, 0.98201379, 0.99908895]
print(softmax(a))
# [7.86089760e-05, 9.50255231e-01, 2.90685280e-07, 2.35544722e-03,
       4.73104222e-02]
print(sum(softmax(a))
# 1.0