On my project I have used k-means to classify data between groups, but I have a problem with the computation of the k-means from Scikit-learn - it was very slow. I need to boost it.
I have tried to change the number of n_jobs
to -1
, but still very slow!
Any suggestions how to speed up?
The main solution in scikit-learn is to switch to mini-batch kmeans which reduces computational resources a lot. To some extent it is an analogous approach to SGD (Stochastic Gradient Descent) vs. GD (Gradient Descent) for optimising non-linear functions - SGD is usually faster (in terms of computational cycles needed to converge to the local solution). Note that this introduces more variance to the optimisation, thus results might be harder to reproduce (optimisation will end up in different solutions more often than "full batch" kmeans).