Update: Weighted samples are now supported by scipy.stats.gaussian_kde
. See here and here for details.
It is currently not possible to use scipy.stats.gaussian_kde
to estimate the density of a random variable based on weighted samples. What methods are available to estimate densities of continuous random variables based on weighted samples?
Neither sklearn.neighbors.KernelDensity
nor statsmodels.nonparametric
seem to support weighted samples. I modified scipy.stats.gaussian_kde
to allow for heterogeneous sampling weights and thought the results might be useful for others. An example is shown below.
An ipython
notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5
The weighted arithmetic mean is
The unbiased data covariance matrix is then given by
The bandwidth can be chosen by scott
or silverman
rules as in scipy
. However, the number of samples used to calculate the bandwidth is Kish's approximation for the effective sample size.