I am new to dplyr and trying to do the following transformation without any luck. I've searched across the internet and I have found examples to do the same in ddply but I'd like to use dplyr.
I have the following data:
month type count
1 Feb-14 bbb 341
2 Feb-14 ccc 527
3 Feb-14 aaa 2674
4 Mar-14 bbb 811
5 Mar-14 ccc 1045
6 Mar-14 aaa 4417
7 Apr-14 bbb 1178
8 Apr-14 ccc 1192
9 Apr-14 aaa 4793
10 May-14 bbb 916
.. ... ... ...
I want to use dplyr to calculate the percentage of each type (aaa, bbb, ccc) at a month level i.e.
month type count per
1 Feb-14 bbb 341 9.6%
2 Feb-14 ccc 527 14.87%
3 Feb-14 aaa 2674 ..
.. ... ... ...
I've tried
data %>%
group_by(month, type) %>%
summarise(count / sum(count))
This gives a 1 as each value. How do I make the sum(count) sum across all the types in the month?
Try
library(dplyr)
data %>%
group_by(month) %>%
mutate(countT= sum(count)) %>%
group_by(type, add=TRUE) %>%
mutate(per=paste0(round(100*count/countT,2),'%'))
We could also use left_join
after summarising the sum(count)
by 'month'
Or an option using data.table
.
library(data.table)
setkey(setDT(data), month)[data[, list(count=sum(count)), month],
per:= paste0(round(100*count/i.count,2), '%')][]