How to speed-up k-means from Scikit learn?

user8058941 picture user8058941 · Oct 1, 2017 · Viewed 7.6k times · Source

On my project I have used k-means to classify data between groups, but I have a problem with the computation of the k-means from Scikit-learn - it was very slow. I need to boost it.

I have tried to change the number of n_jobs to -1, but still very slow!

Any suggestions how to speed up?

Answer

lejlot picture lejlot · Oct 1, 2017

The main solution in scikit-learn is to switch to mini-batch kmeans which reduces computational resources a lot. To some extent it is an analogous approach to SGD (Stochastic Gradient Descent) vs. GD (Gradient Descent) for optimising non-linear functions - SGD is usually faster (in terms of computational cycles needed to converge to the local solution). Note that this introduces more variance to the optimisation, thus results might be harder to reproduce (optimisation will end up in different solutions more often than "full batch" kmeans).