Select unique values with 'select' function in 'dplyr' library

nodm picture nodm · Aug 29, 2014 · Viewed 111.5k times · Source

Is it possible to select all unique values from a column of a data.frame using select function in dplyr library? Something like "SELECT DISTINCT field1 FROM table1" in SQL notation.

Thanks!

Answer

Ron Gejman picture Ron Gejman · Oct 23, 2014

In dplyr 0.3 this can be easily achieved using the distinct() method.

Here is an example:

distinct_df = df %>% distinct(field1)

You can get a vector of the distinct values with:

distinct_vector = distinct_df$field1

You can also select a subset of columns at the same time as you perform the distinct() call, which can be cleaner to look at if you examine the data frame using head/tail/glimpse.:

distinct_df = df %>% distinct(field1) %>% select(field1) distinct_vector = distinct_df$field1