I have the following 2 data.frames:
a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = 1:3, b=letters[1:3])
I want to find the row a1 has that a2 doesn't.
Is there a built in function for this type of operation?
(p.s: I did write a solution for it, I am simply curious if someone already made a more crafted code)
Here is my solution:
a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = 1:3, b=letters[1:3])
rows.in.a1.that.are.not.in.a2 <- function(a1,a2)
{
a1.vec <- apply(a1, 1, paste, collapse = "")
a2.vec <- apply(a2, 1, paste, collapse = "")
a1.without.a2.rows <- a1[!a1.vec %in% a2.vec,]
return(a1.without.a2.rows)
}
rows.in.a1.that.are.not.in.a2(a1,a2)
SQLDF
provides a nice solution
a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = 1:3, b=letters[1:3])
require(sqldf)
a1NotIna2 <- sqldf('SELECT * FROM a1 EXCEPT SELECT * FROM a2')
And the rows which are in both data frames:
a1Ina2 <- sqldf('SELECT * FROM a1 INTERSECT SELECT * FROM a2')
The new version of dplyr
has a function, anti_join
, for exactly these kinds of comparisons
require(dplyr)
anti_join(a1,a2)
And semi_join
to filter rows in a1
that are also in a2
semi_join(a1,a2)