I have what feels like a simple problem, but I can't seem to find an answer. I'm pretty new to Weka, but I feel like I've done a bit of research on this (at least read through the first couple of pages of Google results) and come up dry.
I am using Weka to run clustering using Simple K-Means. In the results list I have no problem visualizing my output ("Visualize cluster assignments") and it is clear both from my understanding of the K-Means algorithm and the output of Weka that each of my instances is ending up as a member of a different cluster (centered around a particular centroid, if you will).
I can see something of the cluster composition from the text output. However Weka provides me with no explicit "mapping" from instance number to cluster number. I would like something like:
instance 1 --> cluster 0
instance 2 --> cluster 0
instance 3 --> cluster 2
instance 4 --> cluster 1
... etc.
How do I obtain these results without calculating the distance from each item to each centroid on my own?
I had the same problem and figured it out. I am posting the method here if anyone needs to know :
Its actually quite simple, you have to use Weka's java api.
SimpleKMeans kmeans = new SimpleKMeans();
kmeans.setSeed(10);
// This is the important parameter to set
kmeans.setPreserveInstancesOrder(true);
kmeans.setNumClusters(numberOfClusters);
kmeans.buildClusterer(instances);
// This array returns the cluster number (starting with 0) for each instance
// The array has as many elements as the number of instances
int[] assignments = kmeans.getAssignments();
int i=0;
for(int clusterNum : assignments) {
System.out.printf("Instance %d -> Cluster %d", i, clusterNum);
i++;
}