How do you compare the "similarity" between two dendrograms (in R)?

Tal Galili picture Tal Galili · Feb 7, 2010 · Viewed 12.7k times · Source

I have two dendrograms which I wish to compare to each other in order to find out how "similar" they are. But I don't know of any method to do so (let alone a code to implement it, say, in R).

Any leads ?

UPDATE (2014-09-13):

Since asking this question, I have written an R package called dendextend, for the visualization, manipulation and comparison of dendrogram. This package is on CRAN and comes with a detailed vignette. It includes functions such as cor_cophenetic, cor_bakers_gamma and Bk / Bk_plot. As well as a tanglegram function for visually comparing two trees.

Answer

Aniko picture Aniko · Feb 8, 2010

Comparing dendrograms is not quite the same as comparing hierarchical clusterings, because the former includes the lengths of branches as well as the splits, but I also think that's a good start. I would suggest you read E. B. Fowlkes & C. L. Mallows (1983). "A Method for Comparing Two Hierarchical Clusterings". Journal of the American Statistical Association 78 (383): 553–584 (link).

Their approach is based on cutting the trees at each level k, getting a measure Bk that compares the groupings into k clusters, and then examining the Bk vs k plots. The measure Bk is based upon looking at pairs of objects and seeing whether they fall into the same cluster or not.

I am sure that one can write code based on this method, but first we would need to know how the dendrograms are represented in R.