What do we mean when we say that we are evaluating the clusters in WEKA frmework? Clustering is an unsupervised approach to grouping objects. What do we mean when we say we want to evaluate the result? Also, in addition to this, when we say that we are evaluating the clusters on top of the training data itself, what does that mean?
Thanks Abhishek S
As written on this page:
Evaluation The way Weka evaluates the clusterings depends on the cluster mode you select. Four different cluster modes are available (as buttons in the Cluster mode panel):
Use training set
(default). After generating the clustering Weka classifies the training instances into clusters according to the cluster representation and computes the percentage of instances falling in each cluster. For example, the above clustering produced by k-means shows 43% (6 instances) in cluster 0 and 57% (8 instances) in cluster 1.Supplied test set
or Percentage split
Weka can evaluate clusterings on separate test data if the cluster representation is probabilistic (e.g. for EM).Classes to clusters evaluation
. In this mode Weka first ignores the class attribute and generates the clustering. Then during the test phase it assigns classes to the clusters, based on the majority value of the class attribute within each cluster. Then it computes the classification error, based on this assignment and also shows the corresponding confusion matrix. An example of this for k-means is shown below.