What is the difference of tapply and aggregate in R?

Neo XU picture Neo XU · Sep 22, 2014 · Viewed 7.8k times · Source
Aaa <- data.frame(amount=c(1,2,1,2,1,1,2,2,1,1,1,2,2,2,1), 
                  card=c("a","b","c","a","c","b","a","c","b","a","b","c","a","c","a"))

aggregate(x=Aaa$amount, by=list(Aaa$card), FUN=mean)

##   Group.1    x
## 1       a 1.50
## 2       b 1.25
## 3       c 1.60

tapply(Aaa$amount, Aaa$card, mean)

##    a    b    c 
## 1.50 1.25 1.60 

Above is an example code.

It seems that aggregate and tapply both are very handy and perform similar functionality.

Can someone explain or give examples on their differences?

Answer

IRTFM picture IRTFM · Sep 22, 2014

aggregate is designed to work on multiple columns with one function and returns a dataframe with one row for each category, while tapply is designed to work on a single vector with results returned as a matrix or array. Only using a two-column matrix does not really allow the capacities of either function (or their salient differences) to be demonstrated. aggregate also has a formula method, which tapply does not.

> Aaa <- data.frame(amount=c(1,2,1,2,1,1,2,2,1,1,1,2,2,2,1), cat=sample(letters[21:24], 15,rep=TRUE),
+                   card=c("a","b","c","a","c","b","a","c","b","a","b","c","a","c","a"))
> with( Aaa, tapply(amount, INDEX=list(cat,card), mean) )
    a   b   c
u 1.5 1.5  NA
v 2.0 1.0 2.0
w 1.0  NA 1.5
x 1.5  NA 1.5

>  aggregate(amount~cat+card, data=Aaa, FUN= mean) 
  cat card amount
1   u    a    1.5
2   v    a    2.0
3   w    a    1.0
4   x    a    1.5
5   u    b    1.5
6   v    b    1.0
7   v    c    2.0
8   w    c    1.5
9   x    c    1.5

The xtabs function also delivers an R "table" and it has a formula interface. R tables are matrices that typically have integer values because they are designed to be "contingency tables" holding counts of items in cross-classifications of the marginal categories.