import torch,ipdb
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
rnn = nn.LSTM(input_size=10, hidden_size=20, num_layers=2)
input = Variable(torch.randn(5, 3, 10))
h0 = Variable(torch.randn(2, 3, 20))
c0 = Variable(torch.randn(2, 3, 20))
output, hn = rnn(input, (h0, c0))
This is the LSTM example from the docs. I don't know understand the following things:
Edit:
import torch,ipdb
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
import torch.nn.functional as F
num_layers=3
num_hyperparams=4
batch = 1
hidden_size = 20
rnn = nn.LSTM(input_size=num_hyperparams, hidden_size=hidden_size, num_layers=num_layers)
input = Variable(torch.randn(1, batch, num_hyperparams)) # (seq_len, batch, input_size)
h0 = Variable(torch.randn(num_layers, batch, hidden_size)) # (num_layers, batch, hidden_size)
c0 = Variable(torch.randn(num_layers, batch, hidden_size))
output, hn = rnn(input, (h0, c0))
affine1 = nn.Linear(hidden_size, num_hyperparams)
ipdb.set_trace()
print output.size()
print h0.size()
*** RuntimeError: matrices expected, got 3D, 2D tensors at
The output for the LSTM is the output for all the hidden nodes on the final layer.
hidden_size
- the number of LSTM blocks per layer.
input_size
- the number of input features per time-step.
num_layers
- the number of hidden layers.
In total there are hidden_size * num_layers
LSTM blocks.
The input dimensions are (seq_len, batch, input_size)
.
seq_len
- the number of time steps in each input stream.
batch
- the size of each batch of input sequences.
The hidden and cell dimensions are: (num_layers, batch, hidden_size)
output (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the RNN, for each t.
So there will be hidden_size * num_directions
outputs. You didn't initialise the RNN to be bidirectional so num_directions
is 1. So output_size = hidden_size
.
Edit: You can change the number of outputs by using a linear layer:
out_rnn, hn = rnn(input, (h0, c0))
lin = nn.Linear(hidden_size, output_size)
v1 = nn.View(seq_len*batch, hidden_size)
v2 = nn.View(seq_len, batch, output_size)
output = v2(lin(v1(out_rnn)))
Note: for this answer I assumed that we're only talking about non-bidirectional LSTMs.
Source: PyTorch docs.