R - dplyr Summarize and Retain Other Columns

atclaus picture atclaus · Aug 23, 2016 · Viewed 23.3k times · Source

I am grouping data and then summarizing it, but would also like to retain another column. I do not need to do any evaluations of that column's content as it will always be the same as the group_by column. I can add it to the group_by statement but that does not seem "right". I want to retain State.Full.Name after grouping by State. Thanks

TDAAtest <- data.frame(State=sample(state.abb,1000,replace=TRUE))
TDAAtest$State.Full.Name <- state.name[match(TDAAtest$State,state.abb)]


TDAA.states <- TDAAtest %>%
  filter(!is.na(State)) %>%
  group_by(State) %>%
  summarize(n=n()) %>%
  ungroup() %>%
  arrange(State)

Answer

akrun picture akrun · Aug 23, 2016

Perhaps we need

TDAAtest %>% 
     filter(!is.na(State)) %>%
     group_by(State) %>% 
     summarise(State.Full.Name = first(State.Full.Name), n = n())

Or use mutate to create the column and then do the distinct

TDAAtest %>% f
     filter(!is.na(State)) %>%
     group_by(State) %>% 
     mutate(n= n()) %>% 
     distinct(State, .keep_all=TRUE)