How can I combine rows within the same data frame in R (based on duplicate values under a specific column)?

poeticpersimmon picture poeticpersimmon · Mar 13, 2015 · Viewed 7.3k times · Source

Sample of 2 (made-up) example rows in df:

userid   facultyid  courseid schoolid
167       265        NA       1678  
167       71111      301      NA

Suppose that I have a couple hundred duplicate userid like in the above example. However, the vast majority of userid have different values.

How can I combine rows with duplicate userid in such a way as to stick to the column values in the 1st (of the 2) row unless the first value is NA (in which case the NA will be repopulated with whatever value came from the second row)?

In essence, drawing from the above example, my ideal output would contain:

userid   facultyid  courseid schoolid
167       265        301       1678  

Answer

bergant picture bergant · Mar 13, 2015
aggregate(x = df1, by = list(df1$userid), FUN = function(x) na.omit(x)[1])[,-1]

or use dplyr library:

library(dplyr)

df1 %>%
  group_by(userid) %>%
  summarise_each(funs(first(na.omit(.))))