Is it possible to select all unique values from a column of a data.frame
using select
function in dplyr
library?
Something like "SELECT DISTINCT field1 FROM table1
" in SQL
notation.
Thanks!
In dplyr 0.3 this can be easily achieved using the distinct()
method.
Here is an example:
distinct_df = df %>% distinct(field1)
You can get a vector of the distinct values with:
distinct_vector = distinct_df$field1
You can also select a subset of columns at the same time as you perform the distinct()
call, which can be cleaner to look at if you examine the data frame using head/tail/glimpse.:
distinct_df = df %>% distinct(field1) %>% select(field1)
distinct_vector = distinct_df$field1