In sklearn what is the difference between a SVM model with linear kernel and a SGD classifier with loss=hinge

JackNova picture JackNova · Apr 17, 2015 · Viewed 7.1k times · Source

I see that in scikit-learn I can build an SVM classifier with linear kernel in at last 3 different ways:

Now, I see that the difference between the first two classifiers is that the former is implemented in terms of liblinear and the latter in terms of libsvm.

How the first two classifiers differ from the third one?

Answer

eickenberg picture eickenberg · Apr 17, 2015

The first two always use the full data and solve a convex optimization problem with respect to these data points.

The latter can treat the data in batches and performs a gradient descent aiming to minimize expected loss with respect to the sample distribution, assuming that the examples are iid samples of that distribution.

The latter is typically used when the number of samples is very big or not ending. Observe that you can call the partial_fit function and feed it chunks of data.

Hope this helps?