I'm currently training a recurrent neural network for weather forecasting, using a LSTM layer. The network itself is pretty simple and looks roughly like this:
model = Sequential()
model.add(LSTM(hidden_neurons, input_shape=(time_steps, feature_count), return_sequences=False))
model.add(Dense(feature_count))
model.add(Activation("linear"))
The weights of the LSTM layer do have the following shapes:
for weight in model.get_weights(): # weights from Dense layer omitted
print(weight.shape)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
In short, it looks like there are four "elements" in this LSTM layer. I'm wondering now how to interpret them:
Where is the time_steps
parameter in this representation? How does it influence the weights?
I've read that a LSTM consists of several blocks, like an input and a forget gate. If those are represented in these weight matrices, which matrix belongs to which gate?
Is there any way to see what the network has learned? For example, how much does it take from the last time step (t-1
if we want to forecast t
) and how much from t-2
etc? It would be interesting to know if we could read from the weights that the input t-5
is completely irrelevant, for example.
Clarifications and hints would be greatly appreciated.
If you are using Keras 2.2.0
When you print
print(model.layers[0].trainable_weights)
you should see three tensors: lstm_1/kernel, lstm_1/recurrent_kernel, lstm_1/bias:0
One of the dimensions of each tensor should be a product of
4 * number_of_units
where number_of_units is your number of neurons. Try:
units = int(int(model.layers[0].trainable_weights[0].shape[1])/4)
print("No units: ", units)
That is because each tensor contains weights for four LSTM units (in that order):
i (input), f (forget), c (cell state) and o (output)
Therefore in order to extract weights you can simply use slice operator:
W = model.layers[0].get_weights()[0]
U = model.layers[0].get_weights()[1]
b = model.layers[0].get_weights()[2]
W_i = W[:, :units]
W_f = W[:, units: units * 2]
W_c = W[:, units * 2: units * 3]
W_o = W[:, units * 3:]
U_i = U[:, :units]
U_f = U[:, units: units * 2]
U_c = U[:, units * 2: units * 3]
U_o = U[:, units * 3:]
b_i = b[:units]
b_f = b[units: units * 2]
b_c = b[units * 2: units * 3]
b_o = b[units * 3:]
Source: keras code