WEKA K-Means Clustering

Chris Taylor picture Chris Taylor · Apr 26, 2011 · Viewed 18.1k times · Source

Can anybody explain what the output of the K-Means clustering in WEKA actually means.

For example

kMeans


Number of iterations: 9

Within cluster sum of squared errors: 9434.911100488926

Missing values globally replaced with mean/mode

Cluster centroids:

                  Cluster#
Attribute         Full Data          0          1                           
                      (400)      (310)       (90)
=================================================
competency134        0.0425     0.0548          0  
competency207        0.0425     0.0548          0  
competency263          0.01     0.0129          0  
competency264          0.01     0.0129          0  
competency282          0.01     0.0129          0  
competency289          0.01     0.0129          0  

What do the numbers in the columns actually mean, it says cluster centroids above the table but how is it possible to determine what the centroids of the two clusters are ?

If anybody could explain what the numbers mean I would be most grateful.

If anybody has any ideas how to complete a silhouette evaluation of the clusters found that would also be great.

Thanks

Answer

Yuval F picture Yuval F · May 16, 2011

The first column gives you the overall population centroid. The second and third columns give you the centroids for cluster 0 and 1, respectively. Each row gives the centroid coordinate for the specific dimension.

I believe you need to brush up on your K-means. Finding the centroids is an essential part of the algorithm. The centroids are a result of a specific run of the algorithm and are not unique - a different run may generate a different centroid set.

Please see Michael Abernethy's description of Weka clustering for more details.