I have a dataset whose headers look like so:
PID Time Site Rep Count
I want sum the Count
by Rep
for each PID x Time x Site combo
on the resulting data.frame, I want to get the mean value of Count
for PID x Time x Site
combo.
Current function is as follows:
dummy <- function (data)
{
A<-aggregate(Count~PID+Time+Site+Rep,data=data,function(x){sum(na.omit(x))})
B<-aggregate(Count~PID+Time+Site,data=A,mean)
return (B)
}
This is painfully slow (original data.frame is 510000 20)
. Is there a way to speed this up with plyr?
You should look at the package data.table
for faster aggregation operations on large data frames. For your problem, the solution would look like:
library(data.table)
data_t = data.table(data_tab)
ans = data_t[,list(A = sum(count), B = mean(count)), by = 'PID,Time,Site']