Weighted Gaussian kernel density estimation in `python`

Till Hoffmann picture Till Hoffmann · Dec 23, 2014 · Viewed 9.7k times · Source

Update: Weighted samples are now supported by scipy.stats.gaussian_kde. See here and here for details.

It is currently not possible to use scipy.stats.gaussian_kde to estimate the density of a random variable based on weighted samples. What methods are available to estimate densities of continuous random variables based on weighted samples?

Answer

Till Hoffmann picture Till Hoffmann · Dec 23, 2014

Neither sklearn.neighbors.KernelDensity nor statsmodels.nonparametric seem to support weighted samples. I modified scipy.stats.gaussian_kde to allow for heterogeneous sampling weights and thought the results might be useful for others. An example is shown below.

example

An ipython notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5

Implementation details

The weighted arithmetic mean is

weighted arithmetic mean

The unbiased data covariance matrix is then given by unbiased covariance matrix

The bandwidth can be chosen by scott or silverman rules as in scipy. However, the number of samples used to calculate the bandwidth is Kish's approximation for the effective sample size.