No applicable method for 'anti_join' applied to an object of class "factor"

Prradep picture Prradep · Jun 4, 2015 · Viewed 22.6k times · Source

I want to Identify the rows present in dataframe1 which are not present in dataframe2 based on a particular column. I have used the below code to get the desired information.

diffId <- anti_join(dat$ID,datwe$ID)

Unfortunately, I have encountered with an error:

Error in UseMethod("anti_join") :
no applicable method for 'anti_join' applied to an object of class "factor"

I have checked the class of the desired column in both the dataframes and which turned out to be factor. Have also tried to separate the column into a separate variable in an assumption that it might solve the issue, but of no luck !

fac1 <- datwe$ID
fac2 <- dat$ID
diffId <- anti_join(fac2,fac1)

Could you please share your thoughts ?

Thanks

Answer

zero323 picture zero323 · Jun 4, 2015

Almost all dplyr functions operate on tbls (depending on the context it can be data.frame, data.table, database connection and so on) so what you really want is something like this:

> dat <- data.frame(ID=c(1, 3, 6, 4), x=runif(4))
> datwe <- data.frame(ID=c(3, 5, 8), y=runif(3))
> anti_join(dat, datwe, by='ID') %>% select(ID)
  ID
1  4
2  6
3  1

Note that ordering is clearly not preserved.

If you use factors (unlike numerics in the example above) with different levels there is a conversion between factor and character involved.

If you want to operate on vectors then you can use setdiff (available in both base and dplyr)

> setdiff(dat$ID, datwe$ID)
[1] 1 6 4