k-means with selected initial centers

lel picture lel · Mar 4, 2015 · Viewed 13.4k times · Source

I am trying to k-means clustering with selected initial centroids. It says here that to specify your initial centers:

init : {‘k-means++’, ‘random’ or an ndarray} 

If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.

My code in Python:

X = np.array([[-19.07480000,  -8.536],
              [22.010800000,-10.9737],
              [12.659700000,19.2601]], np.float64)
km = KMeans(n_clusters=3,init=X).fit(data)
# print km
centers = km.cluster_centers_
print centers

Returns an error:

RuntimeWarning: Explicit initial center position passed: performing only one init in k-means instead of n_init=10
  n_jobs=self.n_jobs)

and return the same initial centers. Any idea how to form the initial centers so it can be accepted?

Answer

ali_m picture ali_m · Mar 4, 2015

The default behavior of KMeans is to initialize the algorithm multiple times using different random centroids (i.e. the Forgy method). The number of random initializations is then controlled by the n_init= parameter (docs):

n_init : int, default: 10

Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

If you pass an array as the init= argument then only a single initialization will be performed using the centroids explicitly specified in the array. You are getting a RuntimeWarning because you are still passing the default value of n_init=10 (here are the relevant lines of source code).

It's actually totally fine to ignore this warning, but you can make it go away completely by passing n_init=1 if your init= parameter is an array.