Keras LSTM: a time-series multi-step multi-features forecasting - poor results

Bolesław Maliszewski picture Bolesław Maliszewski · Jun 7, 2018 · Viewed 7.2k times · Source

I have a time series dataset containing data from a whole year (date is the index). The data was measured every 15 min (during whole year) which results in 96 timesteps a day. The data is already normalized. The variables are correlated. All the variables except the VAR are weather measures.

VAR is seasonal in a day period and in a week period (as it looks a bit different on weekend, but more less the same every weekend). VAR values are stationary. I would like to predict values of VAR for next two days (192 steps ahead) and for next seven days (672 steps ahead).

Here is the sample of the dataset:

DateIdx               VAR       dewpt       hum         press       temp
2017-04-17 00:00:00   0.369397  0.155039    0.386792    0.196721    0.238889
2017-04-17 00:15:00   0.363214  0.147287    0.429245    0.196721    0.233333
2017-04-17 00:30:00   0.357032  0.139535    0.471698    0.196721    0.227778
2017-04-17 00:45:00   0.323029  0.127907    0.429245    0.204918    0.219444
2017-04-17 01:00:00   0.347759  0.116279    0.386792    0.213115    0.211111
2017-04-17 01:15:00   0.346213  0.127907    0.476415    0.204918    0.169444
2017-04-17 01:30:00   0.259660  0.139535    0.566038    0.196721    0.127778
2017-04-17 01:45:00   0.205564  0.073643    0.523585    0.172131    0.091667
2017-04-17 02:00:00   0.157650  0.007752    0.481132    0.147541    0.055556
2017-04-17 02:15:00   0.122101  0.003876    0.476415    0.122951    0.091667

Input dataset plot

I have decided to use LSTM in Keras. Having data from the whole year, I have used data from past 329 days as a training data and the rest for a validation during the training. train_X -> contains whole measures including VAR from 329 days train_Y -> contains only VAR from 329 days. The value is shifted one step ahead. The rest timesteps goes to test_X and test_Y.

Here is the code I prepare train_X and train_Y:

#X -> is the whole dataframe
#Y -> is a vector of VAR from whole dataframe, already shifted 1 step ahead

#329 * 96 = 31584
train_X = X[:31584]
train_X = train_X.reshape(train_X.shape[0],1,5)
train_Y = Y[:31584]  
train_Y = train_Y.reshape(train_Y.shape[0],1)

To predict next VAR value I would like to use past 672 timesteps (whole week measures). For this reason I have set batch_size=672, so that the ‘fit’ command look like this:

history = model.fit(train_X, train_Y, epochs=50, batch_size=672, validation_data=(test_X, test_Y), shuffle=False)  

Here is the architecture of my network:

model = models.Sequential()
model.add(layers.LSTM(672, input_shape=(None, 672), return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(336, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(168, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(84, return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(21, return_sequences=False))
model.add(layers.Dense(1))
model.compile(loss='mae', optimizer='adam')
model.summary()

From the plot below we can see that the network has learn ‘something’ after 50 epochs:

Plot from the learning process

For the prediction purpose I have prepared a set of data containing last 672 steps with all values and 96 without VAR value – which should be predicted. I also used autoregression, so I updated VAR after each prediction and used it for next prediction.

The predX dataset (used for prediction) looks like this:

print(predX['VAR'][668:677])

DateIdx            VAR
2017-04-23 23:00:00    0.307573
2017-04-23 23:15:00    0.278207
2017-04-23 23:30:00    0.284390
2017-04-23 23:45:00    0.309118
2017-04-24 00:00:00         NaN
2017-04-24 00:15:00         NaN
2017-04-24 00:30:00         NaN
2017-04-24 00:45:00         NaN
2017-04-24 01:00:00         NaN
Name: VAR, dtype: float64

Here is the code (autoregression) I have used to predict next 96 steps:

stepsAhead = 96
historySteps = 672

for i in range(0,stepsAhead):
    j = i + historySteps
    ypred = model.predict(predX.values[i:j].reshape(1,historySteps,5))
    predX['VAR'][j] = ypred

Unfortunately the results are very poor and very far from the expectations:

A Plot with predicted data

Results combined with a previous day:

Predicted data combined with a previous day

Except from the ‘What have I done wrong‘ question, I would like to ask a few questions:

Q1. During model fifing, I have just put the whole history in batches of 672 size. Is it correct? How should I organize the dataset for the model fitting? What options do I have? Should I use the “sliding window” approach (like in the link here: https://machinelearningmastery.com/promise-recurrent-neural-networks-time-series-forecasting/ )?

Q2. Is the 50 epochs enough? What is the common practice here? Maybe the network is underfitted resulting in poor prediction? So far I tried 200 epoch with the same result.

Q3. Should I try a different architecture? Is the proposed network ‘big enough’ to handle such a data? Maybe a “stateful” network is the right approach here?

Q4. Did I implement the autoregression correctly? Is there any other approach to make a prediction for many steps ahead e.g. 192 or 672 like in my case?

Answer

nuric picture nuric · Jun 7, 2018

It looks like there is a confusion on how to organise the data to train a RNN. So let's cover the questions:

  1. Once you have a 2D dataset (total_samples, 5) you can use the TimeseriesGenerator to create a sliding window what will generate (batch_size, past_timesteps, 5) for you. In this case, you will use .fit_generator to train the network.
  2. If you get the same result, 50 epochs should be fine. You usually adjust based on the performance of your network. But you should keep it fixed if you are comparing two different network architectures.
  3. Architecture is really large as you aim to predict all 672 future values at once. You can design the network so it learns to predict one measurement at a time. At prediction time you can predict one point and feed that again to predict the next until you get 672.
  4. This ties into answer 3, you can learn to predict one at a time and chain the predictions to n number of predictions after training.

The single point prediction model could look like:

model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape=(past_timesteps, 5))
model.add(LSTM(64))
model.add(Dense(1))