I am trying to implement neural network with RELU.
input layer -> 1 hidden layer -> relu -> output layer -> softmax layer
Above is the architecture of my neural network. I am confused about backpropagation of this relu. For derivative of RELU, if x <= 0, output is 0. if x > 0, output is 1. So when you calculate the gradient, does that mean I kill gradient decent if x<=0?
Can someone explain the backpropagation of my neural network architecture 'step by step'?
if x <= 0, output is 0. if x > 0, output is 1
The ReLU function is defined as: For x > 0 the output is x, i.e. f(x) = max(0,x)
So for the derivative f '(x) it's actually:
if x < 0, output is 0. if x > 0, output is 1.
The derivative f '(0) is not defined. So it's usually set to 0 or you modify the activation function to be f(x) = max(e,x) for a small e.
Generally: A ReLU is a unit that uses the rectifier activation function. That means it works exactly like any other hidden layer but except tanh(x), sigmoid(x) or whatever activation you use, you'll instead use f(x) = max(0,x).
If you have written code for a working multilayer network with sigmoid activation it's literally 1 line of change. Nothing about forward- or back-propagation changes algorithmically. If you haven't got the simpler model working yet, go back and start with that first. Otherwise your question isn't really about ReLUs but about implementing a NN as a whole.