Using scipy.stats.gaussian_kde with 2 dimensional data

jet picture jet · Nov 8, 2010 · Viewed 11.1k times · Source

I'm trying to use the scipy.stats.gaussian_kde class to smooth out some discrete data collected with latitude and longitude information, so it shows up as somewhat similar to a contour map in the end, where the high densities are the peak and low densities are the valley.

I'm having a hard time putting a two-dimensional dataset into the gaussian_kde class. I've played around to figure out how it works with 1 dimensional data, so I thought 2 dimensional would be something along the lines of:

from scipy import stats
from numpy import array
data = array([[1.1, 1.1],
              [1.2, 1.2],
              [1.3, 1.3]])
kde = stats.gaussian_kde(data)
kde.evaluate([1,2,3],[1,2,3])

which is saying that I have 3 points at [1.1, 1.1], [1.2, 1.2], [1.3, 1.3]. and I want to have the kernel density estimation using from 1 to 3 using width of 1 on x and y axis.

When creating the gaussian_kde, it keeps giving me this error:

raise LinAlgError("singular matrix")
numpy.linalg.linalg.LinAlgError: singular matrix

Looking into the source code of gaussian_kde, I realize that the way I'm thinking about what dataset means is completely different from how the dimensionality is calculate, but I could not find any sample code showing how multi-dimension data works with the module. Could someone help me with some sample ways to use gaussian_kde with multi-dimensional data?

Answer

endolith picture endolith · May 25, 2011

This example seems to be what you're looking for:

import numpy as np
import scipy.stats as stats
from matplotlib.pyplot import imshow

# Create some dummy data
rvs = np.append(stats.norm.rvs(loc=2,scale=1,size=(2000,1)),
                stats.norm.rvs(loc=0,scale=3,size=(2000,1)),
                axis=1)

kde = stats.kde.gaussian_kde(rvs.T)

# Regular grid to evaluate kde upon
x_flat = np.r_[rvs[:,0].min():rvs[:,0].max():128j]
y_flat = np.r_[rvs[:,1].min():rvs[:,1].max():128j]
x,y = np.meshgrid(x_flat,y_flat)
grid_coords = np.append(x.reshape(-1,1),y.reshape(-1,1),axis=1)

z = kde(grid_coords.T)
z = z.reshape(128,128)

imshow(z,aspect=x_flat.ptp()/y_flat.ptp())

enter image description here

Axes need fixing, obviously.

You can also do a scatter plot of the data with

scatter(rvs[:,0],rvs[:,1])

enter image description here