I am trying to do a scatter plot of a kmeans output which clusters sentences of the same topic together. The problem i am facing is plotting points that belongs to each cluster a certain color.
sentence_list=["Hi how are you", "Good morning" ...] #i have 10 setences
km = KMeans(n_clusters=5, init='k-means++',n_init=10, verbose=1)
#with 5 cluster, i want 5 different colors
km.fit(vectorized)
km.labels_ # [0,1,2,3,3,4,4,5,2,5]
pipeline = Pipeline([('tfidf', TfidfVectorizer())])
X = pipeline.fit_transform(sentence_list).todense()
pca = PCA(n_components=2).fit(X)
data2D = pca.transform(X)
plt.scatter(data2D[:,0], data2D[:,1])
km.fit(X)
centers2D = pca.transform(km.cluster_centers_)
plt.hold(True)
labels=np.array([km.labels_])
print labels
My problem is in the bottom code for plt.scatter(); what should i use for the parameter c?
c=labels
in the code, i get this error: number in rbg sequence outside 0-1 range
2.When i set c= km.labels_
instead, i get the error:
ValueError: Color array must be two-dimensional
plt.scatter(centers2D[:,0], centers2D[:,1],
marker='x', s=200, linewidths=3, c=labels)
plt.show()
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
# Scaling the data to normalize
model = KMeans(n_clusters=5).fit(X)
# Visualize it:
plt.figure(figsize=(8, 6))
plt.scatter(data[:,0], data[:,1], c=model.labels_.astype(float))
Now you have different color for different clusters.