How does one aggregate and summarize data quickly?

r plyr data.table

Maiasaura · Oct 11, 2011 · Viewed 9.7k times · Source

I have a dataset whose headers look like so:

PID Time Site Rep Count

I want sum the Count by Rep for each PID x Time x Site combo

on the resulting data.frame, I want to get the mean value of Count for PID x Time x Site combo.

Current function is as follows:

dummy <- function (data)
{
A<-aggregate(Count~PID+Time+Site+Rep,data=data,function(x){sum(na.omit(x))})
B<-aggregate(Count~PID+Time+Site,data=A,mean)
return (B)
}

This is painfully slow (original data.frame is 510000 20). Is there a way to speed this up with plyr?

Answer

You should look at the package data.table for faster aggregation operations on large data frames. For your problem, the solution would look like:

library(data.table)
data_t = data.table(data_tab)
ans = data_t[,list(A = sum(count), B = mean(count)), by = 'PID,Time,Site']

How does one aggregate and summarize data quickly?

Answer

Related questions