Aaa <- data.frame(amount=c(1,2,1,2,1,1,2,2,1,1,1,2,2,2,1),
card=c("a","b","c","a","c","b","a","c","b","a","b","c","a","c","a"))
aggregate(x=Aaa$amount, by=list(Aaa$card), FUN=mean)
## Group.1 x
## 1 a 1.50
## 2 b 1.25
## 3 c 1.60
tapply(Aaa$amount, Aaa$card, mean)
## a b c
## 1.50 1.25 1.60
Above is an example code.
It seems that aggregate
and tapply
both are very handy and perform similar functionality.
Can someone explain or give examples on their differences?
aggregate
is designed to work on multiple columns with one function and returns a dataframe with one row for each category, while tapply
is designed to work on a single vector with results returned as a matrix or array. Only using a two-column matrix does not really allow the capacities of either function (or their salient differences) to be demonstrated. aggregate
also has a formula method, which tapply
does not.
> Aaa <- data.frame(amount=c(1,2,1,2,1,1,2,2,1,1,1,2,2,2,1), cat=sample(letters[21:24], 15,rep=TRUE),
+ card=c("a","b","c","a","c","b","a","c","b","a","b","c","a","c","a"))
> with( Aaa, tapply(amount, INDEX=list(cat,card), mean) )
a b c
u 1.5 1.5 NA
v 2.0 1.0 2.0
w 1.0 NA 1.5
x 1.5 NA 1.5
> aggregate(amount~cat+card, data=Aaa, FUN= mean)
cat card amount
1 u a 1.5
2 v a 2.0
3 w a 1.0
4 x a 1.5
5 u b 1.5
6 v b 1.0
7 v c 2.0
8 w c 1.5
9 x c 1.5
The xtabs
function also delivers an R "table" and it has a formula interface. R tables are matrices that typically have integer values because they are designed to be "contingency tables" holding counts of items in cross-classifications of the marginal categories.