Create clusters using correlation matrix in Python

Jasper C. picture Jasper C. · Oct 12, 2018 · Viewed 10.6k times · Source

all, I have a correlation matrix of 21 industry sectors. Now I want to split these 21 sectors into 4 or 5 groups, with sectors of similar behaviors grouped together.

Can experts shed me some lights on how to do this in Python please? Thanks much in advance!

Answer

Wes Doyle picture Wes Doyle · Oct 13, 2018

You might explore the use of Pandas DataFrame.corr and the scipy.cluster Hierarchical Clustering package

import pandas as pd
import scipy.cluster.hierarchy as spc


df = pd.DataFrame(my_data)
corr = df.corr().values

pdist = spc.distance.pdist(corr)
linkage = spc.linkage(pdist, method='complete')
idx = spc.fcluster(linkage, 0.5 * pdist.max(), 'distance')