I have two sets of data, comprising pre and a post data. Respondents have unique IDs, and I want to create a subset which includes only those who responded to both surveys. Example dataset:
pre.data <- data.frame(ID = c(1:10), Y = sample(c("yes", "no"), 10, replace = TRUE),
Survey = 1)
post.data <- data.frame(ID = c(1:3,6:10), Y = sample(c("yes", "no"), 8, replace = TRUE),
Survey = 2)
all.data <- rbind(pre.data, post.data)
I have the following function:
match <- function(dat1, dat2, dat3){ #dat1 is whole dataset(both stitched together)
#dat2 is pre dataset #dat3 is post dataset
selectedRows <- (dat1$ID %in% dat2$ID &
dat1$ID %in% dat3$ID)
matchdata <- dat1[selectedRows,]
return(matchdata)
}
prepost.match.data <- match(all.data, pre.data, post.data)
I think there must be a better way than this function of doing the same thing, but I cannot think how. How I have done it seems a bit messy. I mean, it works - it does what I want it to, but I can't help thinking there's a better way.
My apologies if this has already been asked in a similar way but I was unable to find it - in which case please do point me towards a relevant answer.
Note : Arun posted the same answer in a comment a bit earlier than me.
You can use intersect
like this :
all.data[all.data$ID %in% intersect(pre.data$ID, post.data$ID),]
Which gives :
ID Y Survey
1 1 yes 1
2 2 no 1
3 3 no 1
6 6 yes 1
7 7 yes 1
8 8 yes 1
9 9 no 1
10 10 yes 1
11 1 no 2
12 2 yes 2
13 3 no 2
14 6 no 2
15 7 yes 2
16 8 yes 2
17 9 no 2
18 10 yes 2