Following the example from:
https://github.com/jcjohnson/pytorch-examples
This code trains successfully:
# Code in file tensor/two_layer_net_tensor.py
import torch
device = torch.device('cpu')
# device = torch.device('cuda') # Uncomment this to run on GPU
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random input and output data
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)
# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device)
w2 = torch.randn(H, D_out, device=device)
learning_rate = 1e-6
for t in range(500):
# Forward pass: compute predicted y
h = x.mm(w1)
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(w2)
# Compute and print loss; loss is a scalar, and is stored in a PyTorch Tensor
# of shape (); we can get its value as a Python number with loss.item().
loss = (y_pred - y).pow(2).sum()
print(t, loss.item())
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)
# Update weights using gradient descent
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
How can I predict a single example ? My experience thus far is utilising feedforward networks using just numpy
. After training a model I utilise forward propagation but for a single example :
numpy
code snippet where new
is the output value I'm attempting to predict:
new = np.asarray(toclassify)
Z1 = np.dot(weight_layer_1, new.T) + bias_1
sigmoid_activation_1 = sigmoid(Z1)
Z2 = np.dot(weight_layer_2, sigmoid_activation_1) + bias_2
sigmoid_activation_2 = sigmoid(Z2)
sigmoid_activation_2
contains the predicted vector attributes
Is the idiomatic PyTorch way same? Use forward propagation in order to make a single prediction?
The code you posted is a simple demo trying to reveal the inner mechanism of such deep learning frameworks. These frameworks, including PyTorch, Keras, Tensorflow and many more automatically handle the forward calculation, the tracking and applying gradients for you as long as you defined the network structure. However, the code you showed still try to do these stuff manually. That's the reason why you feel cumbersome when predicting one example, because you are still doing it from scratch.
In practice, we will define a model class inherited from torch.nn.Module
and initialize all the network components (like neural layer, GRU, LSTM layer etc.) in the __init__
function, and define how these components interact with the network input in the forward
function.
Taken the example from the page you've provided:
# Code in file nn/two_layer_net_module.py
import torch
class TwoLayerNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
"""
In the constructor we instantiate two nn.Linear modules and
assign them as
member variables.
"""
super(TwoLayerNet, self).__init__()
self.linear1 = torch.nn.Linear(D_in, H)
self.linear2 = torch.nn.Linear(H, D_out)
def forward(self, x):
"""
In the forward function we accept a Tensor of input data and we must return
a Tensor of output data. We can use Modules defined in the constructor as
well as arbitrary (differentiable) operations on Tensors.
"""
h_relu = self.linear1(x).clamp(min=0)
y_pred = self.linear2(h_relu)
return y_pred
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# Construct our model by instantiating the class defined above.
model = TwoLayerNet(D_in, H, D_out)
# Construct our loss function and an Optimizer. The call to
model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
loss_fn = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
# Forward pass: Compute predicted y by passing x to the model
y_pred = model(x)
# Compute and print loss
loss = loss_fn(y_pred, y)
print(t, loss.item())
# Zero gradients, perform a backward pass, and update the weights.
optimizer.zero_grad()
loss.backward()
optimizer.step()
The code defined a model named TwoLayerNet, it initializes two linear layers in the __init__
function and further defines how these two linears interact with the input x
in the forward
function. Having the model defined, we can perform a single feed-forward operation simply by calling the model instance as illustrated by the end of code snippet:
y_pred = model(x)