I have a data.frame
where I'd like to remove entire groups if any of their members meets a condition.
In this first example, if the values are numbers and the condition is NA
the code below works.
df <- structure(list(world = c(1, 2, 3, 3, 2, NA, 1, 2, 3, 2), place = c(1,
1, 2, 2, 3, 3, 1, 2, 3, 1), group = c(1, 1, 1, 2, 2, 2, 3,
3, 3, 3)), .Names = c("world", "place", "group"), row.names = c(NA,
-10L), class = "data.frame")
ans <- ddply(df, . (group), summarize, code=mean(world))
ans$code[is.na(ans$code)] <- 0
ans2 <- merge(df,ans)
final.ans <- ans2[ans2$code !=0,]
However, this ddply
maneuver with the NA
values will not work if the condition is something other than "NA
", or if the value are non-numeric.
For example, if I wanted to remove any groups which had a member with a world value of AF
(as in the data.frame below) this ddply
trick would not work.
df2 <-structure(list(world = structure(c(1L, 2L, 3L, 3L, 3L, 5L, 1L,
4L, 2L, 4L), .Label = c("AB", "AC", "AD", "AE", "AF"), class = "factor"),
place = c(1, 1, 2, 2, 3, 3, 1, 2, 3, 1), group = c(1,
1, 1, 2, 2, 2, 3, 3, 3, 3)), .Names = c("world", "place",
"group"), row.names = c(NA, -10L), class = "data.frame")
I can envision a for-loop where for each group the value of each member is checked, and if the condition is met a code
column could be populated, and then a subset could me made based on that code.
But, perhaps there is a vectorized, r way to do this?
Try
library(dplyr)
df2 %>%
group_by(group) %>%
filter(!any(world == "AF"))
Or as per metionned by @akrun:
setDT(df2)[, if(!any(world == "AF")) .SD, group]
Or
setDT(df2)[, if(all(world != "AF")) .SD, group]
Which gives:
#Source: local data frame [7 x 3]
#Groups: group
#
# world place group
#1 AB 1 1
#2 AC 1 1
#3 AD 2 1
#4 AB 1 3
#5 AE 2 3
#6 AC 3 3
#7 AE 1 3