This is a seemingly simple R question, but I don't see an exact answer here. I have a data frame (alldata) that looks like this:
Case zip market
1 44485 NA
2 44488 NA
3 43210 NA
There are over 3.5 million records.
Then, I have a second data frame, 'zipcodes'.
market zip
1 44485
1 44486
1 44488
... ... (100 zips in market 1)
2 43210
2 43211
... ... (100 zips in market 2, etc.)
I want to find the correct value for alldata$market for each case based on alldata$zip matching the appropriate value in the zipcode data frame. I'm just looking for the right syntax, and assistance is much appreciated, as usual.
Since you don't care about the market
column in alldata
, you can first strip it off using and merge the columns in alldata
and zipcodes
based on the zip
column using merge
:
merge(alldata[, c("Case", "zip")], zipcodes, by="zip")
The by
parameter specifies the key criteria, so if you have a compound key, you could do something like by=c("zip", "otherfield")
.