I would like to 'summarise' a factor variable in R, so that for each record I know what factor levels are present.
Here is a simplified example dataframe:
df <- data.frame(record= c("a","a","b","c","c","c"),
species = c("COD", "SCE", "COD", "COD","SCE","QSC"))
record species
a COD
a SCE
b COD
c COD
c SCE
c QSC
And this is what I am trying to achieve:
data.frame(record= c(a,b,c), species = c("COD, SCE", "COD", "COD, SCE, QSC"))
record species
a COD, SCE
b COD
c COD, SCE, QSC
This is the closest I have been able to get, but it puts ALL levels of the factor with each record, rather than just the ones that should be present for each record.
summarise(group_by(df, record),
species = (paste(levels(species), collapse="")))
record species
<fctr> <chr>
a CODQSCSCE <- this should be CODSCE
b CODQSCSCE <- this should just be COD
c CODQSCSCE <- this is correct as CODQSCSCE as it has all levels
tapply
returns the same issue
tapply(df$species, df$record, function(x) paste(levels(x), collapse=""))
a b c
"CODQSCSCE" "CODQSCSCE" "CODQSCSCE"
I need to find a way to tell which combinations of species factors are present for each record.
Use unique()
:
library(dplyr)
df %>%
group_by(site) %>%
summarise(species = paste(unique(species), collapse = ', '))
# A tibble: 3 x 2
site species
<fctr> <chr>
1 a COD, SCE
2 b COD
3 c COD, SCE, QSC