Will pandas dataframe object work with sklearn kmeans clustering?

Dark Knight picture Dark Knight · Jan 19, 2015 · Viewed 46.8k times · Source

dataset is pandas dataframe. This is sklearn.cluster.KMeans

 km = KMeans(n_clusters = n_Clusters)

 km.fit(dataset)

 prediction = km.predict(dataset)

This is how I decide which entity belongs to which cluster:

 for i in range(len(prediction)):
     cluster_fit_dict[dataset.index[i]] = prediction[i]

This is how dataset looks:

 A 1 2 3 4 5 6
 B 2 3 4 5 6 7
 C 1 4 2 7 8 1
 ...

where A,B,C are indices

Is this the correct way of using k-means?

Answer

user666 picture user666 · May 7, 2015

Assuming all the values in the dataframe are numeric,

# Convert DataFrame to matrix
mat = dataset.values
# Using sklearn
km = sklearn.cluster.KMeans(n_clusters=5)
km.fit(mat)
# Get cluster assignment labels
labels = km.labels_
# Format results as a DataFrame
results = pandas.DataFrame([dataset.index,labels]).T

Alternatively, you could try KMeans++ for Pandas.