I have a dataset that I would like to fit to a known probability distribution. The intention is to use the fitted PDF in a data generator - such that I can sample data from the known (fitted) PDF. Data will be used for simulation purposes. At the moment I am just sampling from a normal distribution, which is inconsistent with the real-data, therefore simulation results are not accurate.
I first wanted to use the following method : Fitting empirical distribution to theoretical ones with Scipy (Python)?
My first thought was to fit it to a weibull distribution, but the data is actually multimodal (picture attached). So I guess I need to combine multiple distributions and then fit the data to the resulting dist, is that right ? Maybe combine a gaussian AND a weibull distirbution ?
How can I use the scipy fit() function with a mixed/multimodal distribution ?
Also I would want to do this in Python (i.e. scipy/numpy/matplotlib), as the data generator is written in Python.
Many thanks !
I would suggest Kernel Density Estimation (KDE). It gives you a solution as a mixture of PDF.
SciPy has only Gaussian kernel (which lookes fine for your specific histogram), but you can find other kernels in the statsmodels
or scikit-learn
packages.
For reference, those are the relevant functions:
from sklearn.neighbors import KernelDensity
from scipy.stats import gaussian_kde
from statsmodels.nonparametric.kde import KDEUnivariate
from statsmodels.nonparametric.kernel_density import KDEMultivariate
A great resource for KDE in Python is here.