Find complement of a data frame (anti - join)

oercim picture oercim · Feb 24, 2015 · Viewed 29.4k times · Source

I have two data frames(df and df1). df1 is subset of df. I want to get a data frame which is complement of df1 in df, i.e. return rows of the first data set which are not matched in the second. For example let,

data frame df:

heads
row1
row2
row3
row4
row5

data frame df1:

heads
row3
row5

Then the desired output df2 is:

heads
row1
row2
row4

Answer

David Arenburg picture David Arenburg · Feb 24, 2015

You could also do some type of anti join with data.tables binary join

library(data.table)
setkey(setDT(df), heads)[!df1]
#    heads
# 1:  row1
# 2:  row2
# 3:  row4

EDIT: Starting data.table v1.9.6+ we can join data.tables without setting keys while using on

setDT(df)[!df1, on = "heads"]

EDIT2: Starting data.table v1.9.8+ fsetdiff was introduced which is basically a variation of the solution above, just over all the column names of the x data.table, e.g. x[!y, on = names(x)]. If all set to FALSE (the default behavior), then only unique rows in x will be returned. For the case of only one column in each data.table the following will be equivalent to the previous solutions

fsetdiff(df, df1, all = TRUE)