Why feature scaling in SVM?

machine-learning svm scaling

Kevin · Oct 6, 2014 · Viewed 45.5k times · Source

I found that scaling in SVM (Support Vector Machine) problems really improve its performance... I have read this explanation:

"The main advantage of scaling is to avoid attributes in greater numeric ranges dominating those in smaller numeric ranges."

Unfortunately this didn't help me ... Can somebody provide me a better explanation? Thank you in advance!

Answer

Feature scaling is a general trick applied to optimization problems (not just SVM). The underline algorithm to solve the optimization problem of SVM is gradient descend. Andrew Ng has a great explanation in his coursera videos here.

I will illustrate the core ideas here (I borrow Andrew's slides). Suppose you have only two parameters and one of the parameters can take a relatively large range of values. Then the contour of the cost function can look like very tall and skinny ovals (see blue ovals below). Your gradients (the path of gradient is drawn in red) could take a long time and go back and forth to find the optimal solution.
enter image description here

Instead if your scaled your feature, the contour of the cost function might look like circles; then the gradient can take a much more straight path and achieve the optimal point much faster. enter image description here

Why feature scaling in SVM?

Answer

Related questions