How to traverse a tree from sklearn AgglomerativeClustering?

alvas picture alvas · Dec 9, 2014 · Viewed 7.1k times · Source

I have a numpy text file array at: https://github.com/alvations/anythingyouwant/blob/master/WN_food.matrix

It's a distance matrix between terms and each other, my list of terms are as such: http://pastebin.com/2xGt7Xjh

I used the follow code to generate a hierarchical cluster:

import numpy as np
from sklearn.cluster import AgglomerativeClustering

matrix = np.loadtxt('WN_food.matrix')
n_clusters = 518
model = AgglomerativeClustering(n_clusters=n_clusters,
                                linkage="average", affinity="cosine")
model.fit(matrix)

To get the clusters for each term, I could have done:

for term, clusterid in enumerate(model.labels_):
    print term, clusterid

But how do I traverse the tree that the AgglomerativeClustering outputs?

Is it possible to convert it into a scipy dendrogram (http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.hierarchy.dendrogram.html)? And after that how do I traverse the dendrogram?

Answer

A.P. picture A.P. · Dec 15, 2014

I've answered a similar question for sklearn.cluster.ward_tree: How do you visualize a ward tree from sklearn.cluster.ward_tree?

AgglomerativeClustering outputs the tree in the same way, in the children_ attribute. Here's an adaptation of the code in the ward tree question for AgglomerativeClustering. It outputs the structure of the tree in the form (node_id, left_child, right_child) for each node of the tree.

import numpy as np
from sklearn.cluster import AgglomerativeClustering
import itertools

X = np.concatenate([np.random.randn(3, 10), np.random.randn(2, 10) + 100])
model = AgglomerativeClustering(linkage="average", affinity="cosine")
model.fit(X)

ii = itertools.count(X.shape[0])
[{'node_id': next(ii), 'left': x[0], 'right':x[1]} for x in model.children_]

https://stackoverflow.com/a/26152118