How to setup 1D-Convolution and LSTM in Keras

Thuan N. picture Thuan N. · Jul 15, 2018 · Viewed 20k times · Source

I would like to use 1D-Conv layer following by LSTM layer to classify a 16-channel 400-timestep signal.

The input shape is composed of:

  • X = (n_samples, n_timesteps, n_features), where n_samples=476, n_timesteps=400, n_features=16 are the number of samples, timesteps, and features (or channels) of the signal.

  • y = (n_samples, n_timesteps, 1). Each timestep is labeled by either 0 or 1 (binary classification).

I use the 1D-Conv to extract the temporal information, as shown in the figure below. F=32 and K=8 are the filters and kernel_size. 1D-MaxPooling is used after 1D-Conv. 32-unit LSTM is used for signal classification. The model should return a y_pred = (n_samples, n_timesteps, 1).

enter image description here

The code snippet is shown as follow:

input_layer = Input(shape=(dataset.n_timestep, dataset.n_feature))
conv1 = Conv1D(filters=32,
               kernel_size=8,
               strides=1,
               activation='relu')(input_layer)
pool1 = MaxPooling1D(pool_size=4)(conv1)
lstm1 = LSTM(32)(pool1)
output_layer = Dense(1, activation='sigmoid')(lstm1)
model = Model(inputs=input_layer, outputs=output_layer) 

The model summary is shown below:

enter image description here

However, I got the following error:

ValueError: Error when checking target: expected dense_15 to have 2 dimensions, but got array with shape (476, 400, 1).

I guess the problem was the incorrect shape. Please let me know how to fix it.

Another question is the number of timesteps. Because the input_shape is assigned in the 1D-Conv, how can we let LSTM know the timestep must be 400?


I would like to add the model graph based on the suggestion of @today. In this case, the timestep of LSTM will be 98. Do we need to use TimeDistributed in this case? I failed to apply the TimeDistributed in the Conv1D.

enter image description here

Is there anyway to perform the convolution among channels, instead of timesteps? For example, a filter (2, 1) traverses each timestep, as shown in figure below. enter image description here

Thanks.

Answer

today picture today · Jul 15, 2018

If you want to predict one value for each timestep, two slightly different solutions come to my mind:

1) Remove the MaxPooling1D layer, add the padding='same' argument to Conv1D layer and add return_sequence=True argument to LSTM so that the LSTM returns the output of each timestep:

from keras.layers import Input, Dense, LSTM, MaxPooling1D, Conv1D
from keras.models import Model

input_layer = Input(shape=(400, 16))
conv1 = Conv1D(filters=32,
               kernel_size=8,
               strides=1,
               activation='relu',
               padding='same')(input_layer)
lstm1 = LSTM(32, return_sequences=True)(conv1)
output_layer = Dense(1, activation='sigmoid')(lstm1)
model = Model(inputs=input_layer, outputs=output_layer)

model.summary()

The model summary would be:

Layer (type)                 Output Shape              Param #   
=================================================================
input_4 (InputLayer)         (None, 400, 16)           0         
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 400, 32)           4128      
_________________________________________________________________
lstm_4 (LSTM)                (None, 400, 32)           8320      
_________________________________________________________________
dense_4 (Dense)              (None, 400, 1)            33        
=================================================================
Total params: 12,481
Trainable params: 12,481
Non-trainable params: 0
_________________________________________________________________

2) Just change the number of units in the Dense layer to 400 and reshape y to (n_samples, n_timesteps):

from keras.layers import Input, Dense, LSTM, MaxPooling1D, Conv1D
from keras.models import Model

input_layer = Input(shape=(400, 16))
conv1 = Conv1D(filters=32,
               kernel_size=8,
               strides=1,
               activation='relu')(input_layer)
pool1 = MaxPooling1D(pool_size=4)(conv1)
lstm1 = LSTM(32)(pool1)
output_layer = Dense(400, activation='sigmoid')(lstm1)
model = Model(inputs=input_layer, outputs=output_layer)

model.summary()

The model summary would be:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_6 (InputLayer)         (None, 400, 16)           0         
_________________________________________________________________
conv1d_6 (Conv1D)            (None, 393, 32)           4128      
_________________________________________________________________
max_pooling1d_5 (MaxPooling1 (None, 98, 32)            0         
_________________________________________________________________
lstm_6 (LSTM)                (None, 32)                8320      
_________________________________________________________________
dense_6 (Dense)              (None, 400)               13200     
=================================================================
Total params: 25,648
Trainable params: 25,648
Non-trainable params: 0
_________________________________________________________________

Don't forget that in both cases you must use 'binary_crossentropy' (not 'categorical_crossentropy') as the loss function. I expect this solution to have a lower accuracy than the solution #1; but you must experiment with both and try to change the parameters since it entirely depends on the specific problem you are trying to solve and the nature of the data you have.


Update:

You asked for a convolution layer that only covers one timestep and k adjacent features. Yes, you can do it using a Conv2D layer:

# first add an axis to your data
X = np.expand_dims(X)   # now X has a shape of (n_samples, n_timesteps, n_feats, 1)

# adjust input layer shape ...
conv2 = Conv2D(n_filters, (1, k), ...)   # covers one timestep and k features
# adjust other layers according to the output of convolution layer...

Although I have no idea why you are doing this, to use the output of the convolution layer (which is (?, n_timesteps, n_features, n_filters), one solution is to use a LSTM layer which is wrapped inside a TimeDistributed layer. Or another solution is to flatten the last two axis.