libsvm Shrinking Heuristics

Mihai Todor picture Mihai Todor · Sep 19, 2012 · Viewed 13.8k times · Source

I'm using libsvm in C-SVC mode with a polynomial kernel of degree 2 and I'm required to train multiple SVMs. During training, I am getting either one or even both of these warnings for some of the SVMs that I train:

WARNING: using -h 0 may be faster
*
WARNING: reaching max number of iterations
optimization finished, #iter = 10000000

I've found the description for the h parameter:

-h shrinking : whether to use the shrinking heuristics, 0 or 1 (default 1)

and I've tried to read the explanation from the libsvm documentation, but it's a bit too high level for me. Can anyone please provide a layman's explanation and, perhaps, some suggestions like setting this would be beneficial because...? Also, it would be helpful to know if by setting this parameter for all the SVMs that I train, might produce negative impact on accuracy for those SVMs that do not explicitly give this warning.

I'm not sure what to make of the other warning.

Just to give more details: my training sets have 10 attributes (features) and they consist of 5000 vectors.


Update:

In case anybody else is getting the "reaching max number of iterations", it seems to be caused by numeric stability issues. Also, this will produce a very slow training time. Polynomial kernels do benefit from using cross-validation techniques to determine the best value for regularization (the C parameter), and, in the case of polynomial kernels, for me it helped to keep it smaller than 8. Also, if the kernel is inhomogeneous \sum(\gamma x_i s_i + coef0)^d (sorry, LaTeX is not supported on SO), where coef0 != 0, then cross validation can be implemented with a grid search technique for both gamma and C, since, in this case, the default value for gamma (1 / number_of_features) might not be the best choice. Still, from my experiments, you probably do not want gamma to be too big, since it will cause numeric issues (I am trying a maximum value of 8 for it).

For further inspiration on the possible values for gamma and C one should try poking in grid.py.

Answer

Qnan picture Qnan · Sep 20, 2012

The shrinking heuristics are there to speed up the optimization. As it says in the FAQ, they sometimes help, and sometimes they do not. I believe it's a matter of runtime, rather than convergence.

The fact that the optimization reaches the maximum number of iterations is interesting, though. You might want to play with the tolerance (cost parameter), or have a look at the individual problems that cause this. Are the datasets large?