Local and global minima of the cost function in logistic regression

Keir Simmons picture Keir Simmons · Oct 9, 2016 · Viewed 7.4k times · Source

I'm misunderstanding the idea behind the minima in the derivation of the logistic regression formula.

The idea is to increase the hypothesis as much as possible (i.e correct prediction probability close to 1 as possible), which in turn requires minimising the cost function $J(\theta)$ as much as possible.

Now I've been told that for this all to work, the cost function must be convex. My understanding of convexity requires there to be no maximums, and therefore there can only be one minimum, the global minimum. Is this really the case? If it's not, please explain why not. Also, if it's not the case, then that implies the possibility of multiple minima in the cost function, implying multiple sets of parameters yielding higher and higher probabilities. Is this possible? Or can I be certain the returned parameters refer to the global minima and hence highest probability/ prediction?

Answer

Peter Dimmar picture Peter Dimmar · Jan 11, 2017

The fact that we use convex cost function does not guarantee a convex problem.

There is a distinction between a convex cost function and a convex method.

The typical cost functions you encounter (cross entropy, absolute loss, least squares) are designed to be convex.

However, the convexity of the problem depends also on the type of ML algorithm you use.

Linear algorithms (linear regression, logistic regression etc) will give you convex solutions, that is they will converge. When using neural nets with hidden layers however, you are no longer guaranteed a convex solution.

Thus, convexity is a measure of describing your method not only your cost function!

LR is a linear classification method so you should get a convex optimization problem each time you use it! However, if the data is not linearly separable, it might not give a solution and it definitely won't give you a good solution in that case.