My question is very similar to Normalizing y-axis in histograms in R ggplot to proportion, except that I have two groups of data of different size, and I would like that each proportion is relative to its group size instead of the total size.
To make it clearer, let's say I have two sets of data in a data frame:
dataA<-rnorm(100,3,sd=2)
dataB<-rnorm(400,5,sd=3)
all<-data.frame(dataset=c(rep('A',length(dataA)),rep('B',length(dataB))),value=c(dataA,dataB))
I can plot the two distributions together with:
ggplot(all,aes(x=value,fill=dataset))+geom_histogram(alpha=0.5,position='identity',binwidth=0.5)
and instead of the frequency on the Y axis I can have the proportion with:
ggplot(all,aes(x=value,fill=dataset))+geom_histogram(aes(y=..count../sum(..count..)),alpha=0.5,position='identity',binwidth=0.5)
But this gives the proportion relative to the total data size (500 points here): is it possible to have it relative to each group size?
My goal here is to make it possible to compare visually the proportion of values in a given bin between A and B, independently from their respective size. Ideas which differ from my original one are also welcome!
Thanks!
Like this? [edited based on OP's comment]
ggplot(all,aes(x=value,fill=dataset))+
geom_histogram(aes(y=0.5*..density..),
alpha=0.5,position='identity',binwidth=0.5)
Using y=..density..
scales the histograms so the area under each is 1, or sum(binwidth*y)=1.
As a result, you would use y = binwidth*..density..
to have y represent the fraction of the total in each bin. In your case, binwidth=0.5
.
IMO this is a little easier to interpret:
ggplot(all,aes(x=value,fill=dataset))+
geom_histogram(aes(y=0.5*..density..),binwidth=0.5)+
facet_wrap(~dataset,nrow=2)