Description
Given a dataset that has 10 sequences - a sequence corresponds to a day of stock value recordings - where each constitutes 50 sample recordings of stock values that are separated by 5 minute intervals starting from the morning or 9:05 am. However, there is one extra recording (the 51th sample) that is only available in the training set which is 2 hours later, not 5 minutes, than the last recorded sample in the 50 sample recordings. That 51th sample is required to be predicted for the testing set where the first 50 samples are also given.
I am using the pybrain
recurrent neural network for this problem that groups sequences together, and the label (or commonly known as the target y
) of each sample x_i
is the sample of the next time step x_(i+1)
- a typical formulation in time series prediction.
Example
A sequence for one day is something like:
Signal id Time value
1 - 9:05 - 23
2 - 9:10 - 31
3 - 9:15 - 24
... - ... - ...
50 - 13:15 - 15
Below is the 2 hour later label 'target' given for the training set
and is required to be predicted for the testing set
51 - 15:15 - 11
Question
Now that my recurrent neural network (RNN) has trained on these 10 sequences, if it confronts another sequence, how would I use the RNN
to predict the stock values 2 hours
after the last sample in the sequence ?
Please note that I also have "2 hours later than the last sample stock values" for each of the training sequences but I am not sure how to incorporate that in training the RNN
since it expects identical time intervals between samples. Thanks!
I hope I this will help you out
The more mature Long Short Time Memory (LSTM) neural network is a great fit for this kind of task. LSTM is able to detect common "shapes" and "variations" in the stock value "graph", and there is A LOT of research which tries to prove that such shapes actually occur in real life! See this link for an example.
If you want the network to achieve higher accuracy, I would recommend you to also feed the network the stock values from the previous year (at the exact same date), so that the number of inputs doubles from 50 to 100. Though the network might be well optimised on your dataset, it will never be able to predict the unpredictable behaviour of the future ;)