What is the time complexity of k-means?

algorithm time-complexity k-means

parallel · Sep 5, 2013 · Viewed 25.1k times · Source

I was going through the k-means Wikipedia page. Based on the algorithm, I think the complexity is O(n*k*i) (n = total elements, k = number of cluster iteration)

So can someone explain me this statement from Wikipedia and how is this NP hard?

If k and d (the dimension) are fixed, the problem can be exactly solved in time O(n^dk+1 log n), where n is the number of entities to be clustered.

Answer

It depends on what you call k-means.

The problem of finding the global optimum of the k-means objective function

enter image description here

is NP-hard, where S_i is the cluster i (and there are k clusters), x_j is the d-dimensional point in cluster S_i and μ_i is the centroid (average of the points) of cluster S_i.

However, running a fixed number t of iterations of the standard algorithm takes only O(t*k*n*d), for n (d-dimensional) points, where kis the number of centroids (or clusters). This what practical implementations do (often with random restarts between the iterations).

The standard algorithm only approximates a local optimum of the above function, and so do all the k-means algorithms that I've seen.

What is the time complexity of k-means?

Answer

Related questions