Which distribution fits data better?

neymar picture neymar · Mar 16, 2014 · Viewed 7.6k times · Source

I use fitdistr in R to select which distribution fits my data best.

I've tried Cauchy, Weibull, normal, and Gamma distributions.

The log-likelihoods were: -329.8492 for Cauchy, -277.4931 for Gamma, -327.7622 for Normal, -279.0352 for Weibull.

Which one is the best? The one with the largest value (i.e., Gamma) or the one with the largest abs (i.e., Cauchy)?

Answer

Ben picture Ben · Mar 16, 2014

Voting to close, but a simple test will answer your question

set.seed(1)
# we know these data are normally distributed... 
dat <- rnorm(500,10,1)

# let's compute some fits...
require(MASS)
fits <- list(
 no = fitdistr(dat,"normal"),
 lo = fitdistr(dat,"logistic"),
 ca = fitdistr(dat,"cauchy"),
 we = fitdistr(dat, "weibull")
 )

# get the logliks for each model...
sapply(fits, function(i) i$loglik)

       no        lo        ca        we 
-718.3558 -722.1342 -806.2398 -741.2754

So the loglik that is the largest value is the one that indicates the best fit. We put in normally distributed data, and the loglik for the normal fit is the largest.

You might also find this image useful, from http://people.stern.nyu.edu/adamodar/pdfiles/papers/probabilistic.pdf

enter image description here