How to interpret weights in a LSTM layer in Keras

Isa picture Isa · Mar 17, 2017 · Viewed 10.8k times · Source

I'm currently training a recurrent neural network for weather forecasting, using a LSTM layer. The network itself is pretty simple and looks roughly like this:

model = Sequential()  
model.add(LSTM(hidden_neurons, input_shape=(time_steps, feature_count), return_sequences=False))  
model.add(Dense(feature_count))  
model.add(Activation("linear"))  

The weights of the LSTM layer do have the following shapes:

for weight in model.get_weights(): # weights from Dense layer omitted
    print(weight.shape)

> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)
> (feature_count, hidden_neurons)
> (hidden_neurons, hidden_neurons)
> (hidden_neurons,)

In short, it looks like there are four "elements" in this LSTM layer. I'm wondering now how to interpret them:

  • Where is the time_steps parameter in this representation? How does it influence the weights?

  • I've read that a LSTM consists of several blocks, like an input and a forget gate. If those are represented in these weight matrices, which matrix belongs to which gate?

  • Is there any way to see what the network has learned? For example, how much does it take from the last time step (t-1 if we want to forecast t) and how much from t-2 etc? It would be interesting to know if we could read from the weights that the input t-5 is completely irrelevant, for example.

Clarifications and hints would be greatly appreciated.

Answer

Tomasz Bartkowiak picture Tomasz Bartkowiak · Jul 23, 2018

If you are using Keras 2.2.0

When you print

print(model.layers[0].trainable_weights)

you should see three tensors: lstm_1/kernel, lstm_1/recurrent_kernel, lstm_1/bias:0 One of the dimensions of each tensor should be a product of

4 * number_of_units

where number_of_units is your number of neurons. Try:

units = int(int(model.layers[0].trainable_weights[0].shape[1])/4)
print("No units: ", units)

That is because each tensor contains weights for four LSTM units (in that order):

i (input), f (forget), c (cell state) and o (output)

Therefore in order to extract weights you can simply use slice operator:

W = model.layers[0].get_weights()[0]
U = model.layers[0].get_weights()[1]
b = model.layers[0].get_weights()[2]

W_i = W[:, :units]
W_f = W[:, units: units * 2]
W_c = W[:, units * 2: units * 3]
W_o = W[:, units * 3:]

U_i = U[:, :units]
U_f = U[:, units: units * 2]
U_c = U[:, units * 2: units * 3]
U_o = U[:, units * 3:]

b_i = b[:units]
b_f = b[units: units * 2]
b_c = b[units * 2: units * 3]
b_o = b[units * 3:]

Source: keras code