I really can't understand the following equation, especially 1/(2m)
.
What's the purpose of this equation? And where does 1/(2m)
came from?
J(theta_0, theta_1) = 1/(2m) * sum_(i=1)^m [ h_theta(x^i) - y^i ]^2
Please explain. How it casts???
The cost function is
J(theta_0, theta_1) = 1/(2m) * sum_(i=1)^m [ h_theta(x^i) - y^i ]^2
By h_theta(x^i)
we denote what model outputs for x^i
, so h_theta(x^i) - y^i
is its error (assuming, that y^i
is a correct output).
Now, we calculate the square of this error [ h_theta(x^i) - y^i ]^2
(which removes the sign, as this error could be both positive and negative) and sum it over all samples, and to bound it somehow we normalize it - simply by dividing by m
, so we have mean (because we devide by number of samples) squared (because we square) error (because we compute an error):
1/m * sum_(i=1)^m [ h_theta(x^i) - y^i ]^2
This 2
which appears in the front is used only for simplification of the derivative, because when you will try to minimize it, you will use the steepest descent method, which is based on the derivative of this function. Derivative of a^2
is 2a
, and our function is a square of something, so this 2
will cancel out. This is the only reason of its existance.