I am having trouble plotting a histogram as a pdf (probability)
I want the sum of all the pieces to equal an area of one so it's easier to compare across datasets. For some reason, whenever I specify the breaks (the default of 4 or whatever is terrible), it no longer wants to plot bins as a probability and instead plots bins as a frequency count.
hist(data[,1], freq = FALSE, xlim = c(-1,1), breaks = 800)
What should I change this line to? I need a probability distribution and a large number of bins. (I have 6 million data points)
This is in the R help, but I don't know how to override it:
freq logical; if TRUE, the histogram graphic is a representation of frequencies, the counts component of the result; if FALSE, probability densities, component density, are plotted (so that the histogram has a total area of one). Defaults to TRUE if and only if breaks are equidistant (and probability is not specified).
Thanks
edit: details
hmm so my plot goes above 1 which is quite confusing if it's a probability. I see how it has to do with the bin width now. I more or less want to make every bin worth 1 point while still having a lot of bins. In other words, no bin height should be above 1.0 unless it is directly at 1.0 and all the other bins are 0.0. As it stands now, I have a bins that make a hump around 15.0
edit: height by %points in bin @Dwin : So how do I plot the probability? I realize taking the integral will still give me 1.0 due to the units on the x axis, but this isn't what I want. Say I have 100 points and 5 of them fall into the first bin, then that bin should be at .05 height. This is what I want. Am I doing it wrong and there is another way this is done?
I know how many points I have. Is there a way to divide each bin count in the frequency histogram by this number?
To answer the request to plot probabilities rather than densities:
h <- hist(vec, breaks = 100, plot=FALSE)
h$counts=h$counts/sum(h$counts)
plot(h)