I am trying to k-means clustering with selected initial centroids. It says here that to specify your initial centers:
init : {‘k-means++’, ‘random’ or an ndarray}
If an ndarray
is passed, it should be of shape (n_clusters
, n_features
) and gives the initial centers.
My code in Python:
X = np.array([[-19.07480000, -8.536],
[22.010800000,-10.9737],
[12.659700000,19.2601]], np.float64)
km = KMeans(n_clusters=3,init=X).fit(data)
# print km
centers = km.cluster_centers_
print centers
Returns an error:
RuntimeWarning: Explicit initial center position passed: performing only one init in k-means instead of n_init=10
n_jobs=self.n_jobs)
and return the same initial centers. Any idea how to form the initial centers so it can be accepted?
The default behavior of KMeans
is to initialize the algorithm multiple times using different random centroids (i.e. the Forgy method). The number of random initializations is then controlled by the n_init=
parameter (docs):
n_init : int, default: 10
Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of
n_init
consecutive runs in terms of inertia.
If you pass an array as the init=
argument then only a single initialization will be performed using the centroids explicitly specified in the array. You are getting a RuntimeWarning
because you are still passing the default value of n_init=10
(here are the relevant lines of source code).
It's actually totally fine to ignore this warning, but you can make it go away completely by passing n_init=1
if your init=
parameter is an array.