How much time does take train SVM classifier?

Il'ya Zhenin picture Il'ya Zhenin · Aug 10, 2013 · Viewed 33.6k times · Source

I wrote following code and test it on small data:

classif = OneVsRestClassifier(svm.SVC(kernel='rbf'))
classif.fit(X, y)

Where X, y (X - 30000x784 matrix, y - 30000x1) are numpy arrays. On small data algorithm works well and give me right results.

But I run my program about 10 hours ago... And it is still in process.

I want to know how long it will take, or it stuck in some way? (Laptop specs 4 GB Memory, Core i5-480M)

Answer

lejlot picture lejlot · Aug 11, 2013

SVM training can be arbitrary long, this depends on dozens of parameters:

  • C parameter - greater the missclassification penalty, slower the process
  • kernel - more complicated the kernel, slower the process (rbf is the most complex from the predefined ones)
  • data size/dimensionality - again, the same rule

in general, basic SMO algorithm is O(n^3), so in case of 30 000 datapoints it has to run number of operations proportional to the2 700 000 000 000which is realy huge number. What are your options?

  • change a kernel to the linear one, 784 features is quite a lot, rbf can be redundant
  • reduce features' dimensionality (PCA?)
  • lower the C parameter
  • train model on the subset of your data to find the good parameters and then train the whole one on some cluster/supercomputer