I'm trying to solve a tricky R problem that I haven't been able to solve via Googling keywords. Specifically, I'm trying to take a subset one data frame whose values don't appear in another. Here is an example:
> test
number fruit ID1 ID2
item1 "number1" "apples" "22" "33"
item2 "number2" "oranges" "13" "33"
item3 "number3" "peaches" "44" "25"
item4 "number4" "apples" "12" "13"
> test2
number fruit ID1 ID2
item1 "number1" "papayas" "22" "33"
item2 "number2" "oranges" "13" "33"
item3 "number3" "peaches" "441" "25"
item4 "number4" "apples" "123" "13"
item5 "number3" "peaches" "44" "25"
item6 "number4" "apples" "12" "13"
item7 "number1" "apples" "22" "33"
I have two data frames, test and test2, and the goal is to select all entire rows in test2 that don't appear in test, even though some of the values may be the same.
The output I want would look like:
item1 "number1" "papayas" "22" "33"
item2 "number3" "peaches" "441" "25"
item3 "number4" "apples" "123" "13"
There may be an arbitrary amount of rows or columns, but in my specific case, one data frame is a direct subset of the other.
I've used the R subset(), merge() and which() functions extensively, but couldn't figure out how to use these in combination, if it's possible at all, to get what I want.
edit: Here is the R code I used to generate these two tables.
test <- data.frame(c("number1", "apples", 22, 33), c("number2", "oranges", 13, 33),
c("number3", "peaches", 44, 25), c("number4", "apples", 12, 13))
test <- t(test)
rownames(test) = c("item1", "item2", "item3", "item4")
colnames(test) = c("number", "fruit", "ID1", "ID2")
test2 <- data.frame(data.frame(c("number1", "papayas", 22, 33), c("number2", "oranges", 13, 33),
c("number3", "peaches", 441, 25), c("number4", "apples", 123, 13),c("number3", "peaches", 44, 25), c("number4", "apples", 12, 13) ))
test2 <- t(test2)
rownames(test2) = c("item1", "item2", "item3", "item4", "item5", "item6")
colnames(test2) = c("number", "fruit", "ID1", "ID2")
Thanks in advance!
Here's another way:
x <- rbind(test2, test)
x[! duplicated(x, fromLast=TRUE) & seq(nrow(x)) <= nrow(test2), ]
# number fruit ID1 ID2
# item1 number1 papayas 22 33
# item3 number3 peaches 441 25
# item4 number4 apples 123 13
Edit: modified to preserve row names.