Visualising 10 dimensional data with matplotlib

dkiswanto picture dkiswanto · Oct 29, 2016 · Viewed 11.8k times · Source

I have this kind of data :

ID    x1   x2   x3    x4    x5    x6    x7   x8   x9   x10
1   -0.18   5 -0.40 -0.26  0.53 -0.66  0.10   2 -0.20    1
2   -0.58   5 -0.52 -1.66  0.65 -0.15  0.08   3  3.03   -2
3   -0.62   5 -0.09 -0.38  0.65  0.22  0.44   4  1.49    1
4   -0.22  -3  1.64 -1.38  0.08  0.42  1.24   5 -0.34    0
5    0.00   5  1.76 -1.16  0.78  0.46  0.32   5 -0.51   -2

what's the best method for visualizing this data, i'm using matplotlib to visualizing it, and read it from csv using pandas

thanks

Answer

Angus Williams picture Angus Williams · Oct 29, 2016

Visualising data in a high-dimensional space is always a difficult problem. One solution that is commonly used (and is now available in pandas) is to inspect all of the 1D and 2D projections of the data. It doesn't give you all of the information about the data, but that's impossible to visualise unless you can see in 10D! Here's an example of how to do this with pandas (version 0.7.3 upwards):

import numpy as np 
import pandas as pd
from pandas.tools.plotting import scatter_matrix

#first make some fake data with same layout as yours
data = pd.DataFrame(np.random.randn(100, 10), columns=['x1', 'x2', 'x3',\
                    'x4','x5','x6','x7','x8','x9','x10'])

#now plot using pandas 
scatter_matrix(data, alpha=0.2, figsize=(6, 6), diagonal='kde')

This generates a plot with all of the 2D projections as scatter plots, and KDE histograms of the 1D projections:

enter image description here

I also have a pure matplotlib approach to this on my github page, which produces a very similar type of plot (it is designed for MCMC output, but is also appropriate here). Here's how you'd use it here:

import corner_plot as cp

cp.corner_plot(data.as_matrix(),axis_labels=data.columns,nbins=10,\
              figsize=(7,7),scatter=True,fontsize=10,tickfontsize=7)

enter image description here