How to find cluster centroid with Scikit-learn

sheldonzy picture sheldonzy · May 14, 2018 · Viewed 10.2k times · Source

I have a data set with (labeled) clusters. I'm trying to find the centroids of each cluster (a vector that his distance is the smallest from all data points of the cluster).

I found many solutions to perform clustering and only then find the centroids, but I didn't find yet for existing ones.

Python schikit-learn is preferred. Thanks.

Answer

sascha picture sascha · May 14, 2018

Straight from the docs:

from sklearn.neighbors.nearest_centroid import NearestCentroid
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])
clf = NearestCentroid()
clf.fit(X, y)

print(clf.centroids_)
# [[-2.         -1.33333333]
#  [ 2.          1.33333333]]