How to compute precision and recall in clustering?

Christian Stade-Schuldt picture Christian Stade-Schuldt · Mar 18, 2009 · Viewed 18.5k times · Source

I am really confused how to compute precision and recall in clustering applications.

I have the following situation:

Given two sets A and B. By using a unique key for each element I can determine which of the elements of A and B match. I want to cluster those elements based on features (not using the unique key of course).

I am doing the clustering but I am not sure how to compute precision and recall. The formulas,according to the paper "Extended Performance Graphs for Cluster Retrieval" (http://staff.science.uva.nl/~nicu/publications/CVPR01_nies.pdf) are:

p = precision = relevant retrieved items/retrieved items and r = recall = relevant retrieved items/relevant items

I really do not get what elements fall under which category.

What I did so far is, I checked within the clusters how many matching pairs I have (using the unique key). Is that already one of precision or recall? And if so, which one is it and how can I compute the other one?

Update: I just found another paper with the title "An F-Measure for Evaluation of Unsupervised Clustering with Non-Determined Number of Clusters" at http://mtg.upf.edu/files/publications/unsuperf.pdf.

Answer

theycallmemorty picture theycallmemorty · Mar 23, 2009

I think you'll find wikipedia has a helpful article on precision and recall. In short:

Precision = true positives / (true positives + false positives)

Recall = true positives /( true positivies + false negatives)