I am studying Convolutional Neural Networks. I am confused about some layers in CNN.
Regarding ReLu... I just know that it is the sum of an infinite logistic function, but ReLu doesn't connect to any upper layers. Why do we need ReLu, and how does it work?
Regarding Dropout... How does dropout work? I listened to a video talk from G. Hinton. He said there is a strategy which just ignores half of the nodes, randomly, when training the weights, and halves the weight when predicting. He says it was inspired from random forests and works exactly the same as computing the geometric mean of these randomly trained models.
Is this strategy the same as dropout?
Can someone help me to solve this?
ReLu: The rectifier function is an activation function f(x) = Max(0, x) which can be used by neurons just like any other activation function, a node using the rectifier activation function is called a ReLu node. The main reason that it is used is because of how efficiently it can be computed compared to more conventional activation functions like the sigmoid and hyperbolic tangent, without making a significant difference to generalisation accuracy. The rectifier activation function is used instead of a linear activation function to add non linearity to the network, otherwise the network would only ever be able to compute a linear function.
Dropout: Yes, the technique described is the same as dropout. The reason that randomly ignoring nodes is useful is because it prevents inter-dependencies from emerging between nodes (I.e. nodes do not learn functions which rely on input values from another node), this allows the network to learn more a more robust relationship. Implementing dropout has much the same affect as taking the average from a committee of networks, however the cost is significantly less in both time and storage required.