I have a simple NN model for detecting hand-written digits from a 28x28px image written in python using Keras (Theano backend):
model0 = Sequential()
#number of epochs to train for
nb_epoch = 12
#amount of data each iteration in an epoch sees
batch_size = 128
model0.add(Flatten(input_shape=(1, img_rows, img_cols)))
model0.add(Dense(nb_classes))
model0.add(Activation('softmax'))
model0.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['accuracy'])
model0.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
verbose=1, validation_data=(X_test, Y_test))
score = model0.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])
This runs well and I get ~90% accuracy. I then perform the following command to get a summary of my network's structure by doing print(model0.summary())
. This outputs the following:
Layer (type) Output Shape Param # Connected to
=====================================================================
flatten_1 (Flatten) (None, 784) 0 flatten_input_1[0][0]
dense_1 (Dense) (None, 10) 7850 flatten_1[0][0]
activation_1 (None, 10) 0 dense_1[0][0]
======================================================================
Total params: 7850
I don't understand how they get to 7850 total params and what that actually means?
The number of parameters is 7850 because with every hidden unit you have 784 input weights and one weight of connection with bias. This means that every hidden unit gives you 785 parameters. You have 10 units so it sums up to 7850.
The role of this additional bias term is really important. It significantly increases the capacity of your model. You can read details e.g. here Role of Bias in Neural Networks.