Fitting data to multimodal distributions with scipy, matplotlib

Rosh picture Rosh · Oct 15, 2015 · Viewed 10.1k times · Source

I have a dataset that I would like to fit to a known probability distribution. The intention is to use the fitted PDF in a data generator - such that I can sample data from the known (fitted) PDF. Data will be used for simulation purposes. At the moment I am just sampling from a normal distribution, which is inconsistent with the real-data, therefore simulation results are not accurate.

I first wanted to use the following method : Fitting empirical distribution to theoretical ones with Scipy (Python)?

My first thought was to fit it to a weibull distribution, but the data is actually multimodal (picture attached). So I guess I need to combine multiple distributions and then fit the data to the resulting dist, is that right ? Maybe combine a gaussian AND a weibull distirbution ?

How can I use the scipy fit() function with a mixed/multimodal distribution ?

Also I would want to do this in Python (i.e. scipy/numpy/matplotlib), as the data generator is written in Python.

Many thanks !

histogram of data

Answer

Elad Joseph picture Elad Joseph · Oct 19, 2015

I would suggest Kernel Density Estimation (KDE). It gives you a solution as a mixture of PDF.

SciPy has only Gaussian kernel (which lookes fine for your specific histogram), but you can find other kernels in the statsmodels or scikit-learn packages.

For reference, those are the relevant functions:

from sklearn.neighbors import KernelDensity
from scipy.stats import gaussian_kde
from statsmodels.nonparametric.kde import KDEUnivariate
from statsmodels.nonparametric.kernel_density import KDEMultivariate

A great resource for KDE in Python is here.