Finding ALL duplicate rows, including "elements with smaller subscripts"

Lauren Samuels picture Lauren Samuels · Oct 21, 2011 · Viewed 40k times · Source

R's duplicated returns a vector showing whether each element of a vector or data frame is a duplicate of an element with a smaller subscript. So if rows 3, 4, and 5 of a 5-row data frame are the same, duplicated will give me the vector

FALSE, FALSE, FALSE, TRUE, TRUE

But in this case I actually want to get

FALSE, FALSE, TRUE, TRUE, TRUE

that is, I want to know whether a row is duplicated by a row with a larger subscript too.

Answer

Joshua Ulrich picture Joshua Ulrich · Oct 21, 2011

duplicated has a fromLast argument. The "Example" section of ?duplicated shows you how to use it. Just call duplicated twice, once with fromLast=FALSE and once with fromLast=TRUE and take the rows where either are TRUE.


Some late Edit: You didn't provide a reproducible example, so here's an illustration kindly contributed by @jbaums

vec <- c("a", "b", "c","c","c") 
vec[duplicated(vec) | duplicated(vec, fromLast=TRUE)]
## [1] "c" "c" "c"

Edit: And an example for the case of a data frame:

df <- data.frame(rbind(c("a","a"),c("b","b"),c("c","c"),c("c","c")))
df[duplicated(df) | duplicated(df, fromLast=TRUE), ]
##   X1 X2
## 3  c  c
## 4  c  c