I was going through the k-means Wikipedia page. Based on the algorithm, I think the complexity is O(n*k*i)
(n
= total elements, k
= number of cluster iteration)
So can someone explain me this statement from Wikipedia and how is this NP hard?
If
k
andd
(the dimension) are fixed, the problem can be exactly solved in timeO(ndk+1 log n)
, wheren
is the number of entities to be clustered.
It depends on what you call k-means.
The problem of finding the global optimum of the k-means objective function
is NP-hard, where Si
is the cluster i
(and there are k
clusters), xj
is the d
-dimensional point in cluster Si
and μi
is the centroid (average of the points) of cluster Si
.
However, running a fixed number t
of iterations of the standard algorithm takes only O(t*k*n*d)
, for n
(d
-dimensional) points, where k
is the number of centroids (or clusters). This what practical implementations do (often with random restarts between the iterations).
The standard algorithm only approximates a local optimum of the above function, and so do all the k-means algorithms that I've seen.