Top "Gradient-descent" questions

Gradient Descent is an algorithm for finding the minimum of a function.

gradient descent using python and numpy

def gradient(X_norm,y,theta,alpha,m,n,num_it): temp=np.array(np.zeros_like(theta,float)) for …

python numpy machine-learning linear-regression gradient-descent
Why do we need to call zero_grad() in PyTorch?

The method zero_grad() needs to be called during training. But the documentation is not very helpful | zero_grad(self) | …

python neural-network deep-learning pytorch gradient-descent
Neural network always predicts the same class

I'm trying to implement a neural network that classifies images into one of the two discrete categories. The problem is, …

python-3.x numpy neural-network deep-learning gradient-descent
Why should weights of Neural Networks be initialized to random numbers?

I am trying to build a neural network from scratch. Across all AI literature there is a consensus that weights …

machine-learning neural-network artificial-intelligence mathematical-optimization gradient-descent
pytorch how to set .requires_grad False

I want to set some of my model frozen. Following the official docs: with torch.no_grad(): linear = nn.Linear(1, 1) …

python pytorch gradient-descent
How to do gradient clipping in pytorch?

What is the correct way to perform gradient clipping in pytorch? I have an exploding gradients problem, and I need …

python machine-learning deep-learning pytorch gradient-descent
Common causes of nans during training

I've noticed that a frequent occurrence during training is NANs being introduced. Often times it seems to be introduced by …

machine-learning neural-network deep-learning caffe gradient-descent
What is the difference between Gradient Descent and Newton's Gradient Descent?

I understand what Gradient Descent does. Basically it tries to move towards the local optimal solution by slowly moving down …

machine-learning data-mining mathematical-optimization gradient-descent newtons-method
What is `weight_decay` meta parameter in Caffe?

Looking at an example 'solver.prototxt', posted on BVLC/caffe git, there is a training meta parameter weight_decay: 0.04 What …

machine-learning neural-network deep-learning caffe gradient-descent
What is `lr_policy` in Caffe?

I just try to find out how I can use Caffe. To do so, I just took a look at …

machine-learning neural-network deep-learning caffe gradient-descent