R: Count unique values by category

Klaus Louis picture Klaus Louis · Apr 23, 2013 · Viewed 41k times · Source

I have data in R that looks like this:

 Cnty   Yr   Plt       Spp  DBH Ht Age
 1  185 1999 20001 Bitternut  8.0 54  47
 2  185 1999 20001 Bitternut  7.2 55  50
 3   31 1999 20001    Pignut  7.4 71  60
 4   31 1999 20001    Pignut 11.4 85 114
 5  189 1999 20001        WO 14.5 80  82
 6  189 1999 20001        WO 12.1 72  79

I would like to know the quantity of unique species (Spp) in each county (Cnty). "unique(dfname$Spp)" gives me a total count of unique species in the data frame, but I would like it by county.

Any help is appreciated! Sorry for the weird formatting, this is my first ever question on SO.

Thanks.

Answer

A5C1D2H2I1M1N2O1R2T1 picture A5C1D2H2I1M1N2O1R2T1 · Apr 23, 2013

I've tried to make your sample data a little bit more interesting. Your sample data presently has just one unique "Spp" per "Cnty".

set.seed(1)
mydf <- data.frame(
  Cnty = rep(c("185", "31", "189"), times = c(5, 3, 2)),
  Yr = c(rep(c("1999", "2000"), times = c(3, 2)), 
         "1999", "1999", "2000", "2000", "2000"),
  Plt = "20001",
  Spp = sample(c("Bitternut", "Pignut", "WO"), 10, replace = TRUE),
  DBH = runif(10, 0, 15)
)
mydf
#    Cnty   Yr   Plt       Spp       DBH
# 1   185 1999 20001 Bitternut  3.089619
# 2   185 1999 20001    Pignut  2.648351
# 3   185 1999 20001    Pignut 10.305343
# 4   185 2000 20001        WO  5.761556
# 5   185 2000 20001 Bitternut 11.547621
# 6    31 1999 20001        WO  7.465489
# 7    31 1999 20001        WO 10.764278
# 8    31 2000 20001    Pignut 14.878591
# 9   189 2000 20001    Pignut  5.700528
# 10  189 2000 20001 Bitternut 11.661678

Next, as suggested, tapply is a good candidate here. Combine unique and length to get the data you are looking for.

with(mydf, tapply(Spp, Cnty, FUN = function(x) length(unique(x))))
# 185 189  31 
#   3   2   2 
with(mydf, tapply(Spp, list(Cnty, Yr), FUN = function(x) length(unique(x))))
#     1999 2000
# 185    2    2
# 189   NA    2
# 31     1    1

If you're interested in simple tabulation (not of unique values), then you can explore table and ftable:

with(mydf, table(Spp, Cnty))
#            Cnty
# Spp         185 189 31
#   Bitternut   2   1  0
#   Pignut      2   1  1
#   WO          1   0  2
ftable(mydf, row.vars="Spp", col.vars=c("Cnty", "Yr"))
#           Cnty  185       189        31     
#           Yr   1999 2000 1999 2000 1999 2000
# Spp                                         
# Bitternut         1    1    0    1    0    0
# Pignut            2    0    0    1    0    1
# WO                0    1    0    0    2    0