all, I have a correlation matrix of 21 industry sectors. Now I want to split these 21 sectors into 4 or 5 groups, with sectors of similar behaviors grouped together.
Can experts shed me some lights on how to do this in Python please? Thanks much in advance!
You might explore the use of Pandas DataFrame.corr
and the scipy.cluster
Hierarchical Clustering package
import pandas as pd
import scipy.cluster.hierarchy as spc
df = pd.DataFrame(my_data)
corr = df.corr().values
pdist = spc.distance.pdist(corr)
linkage = spc.linkage(pdist, method='complete')
idx = spc.fcluster(linkage, 0.5 * pdist.max(), 'distance')