This question is somewhat related to issues Efficiently merging two data frames on a non-trivial criteria and Checking if date is between two dates in r. And the one I have posted here requesting if the feature exist: GitHub issue
I am looking to join two dataframes using dplyr::left_join()
. The condition I use to join is less-than, greater-than i.e, <=
and >
. Does dplyr::left_join()
support this feature? or do the keys only take =
operator between them. This is straightforward to run from SQL (assuming I have the dataframe in the database)
Here is a MWE: I have two datasets one firm-year (fdata
), while second is sort of survey data that happens once every five years. So for all years in the fdata
that are in between two survey years, I join the corresponding survey year data.
id <- c(1,1,1,1,
2,2,2,2,2,2,
3,3,3,3,3,3,
5,5,5,5,
8,8,8,8,
13,13,13)
fyear <- c(1998,1999,2000,2001,1998,1999,2000,2001,2002,2003,
1998,1999,2000,2001,2002,2003,1998,1999,2000,2001,
1998,1999,2000,2001,1998,1999,2000)
byear <- c(1990,1995,2000,2005)
eyear <- c(1995,2000,2005,2010)
val <- c(3,1,5,6)
sdata <- tbl_df(data.frame(byear, eyear, val))
fdata <- tbl_df(data.frame(id, fyear))
test1 <- left_join(fdata, sdata, by = c("fyear" >= "byear","fyear" < "eyear"))
I get
Error: cannot join on columns 'TRUE' x 'TRUE': index out of bounds
Unless if left_join
can handle the condition, but my syntax is missing something?
data.table
adds non-equi joins starting from v 1.9.8
library(data.table) #v>=1.9.8
setDT(sdata); setDT(fdata) # converting to data.table in place
fdata[sdata, on = .(fyear >= byear, fyear < eyear), nomatch = 0,
.(id, x.fyear, byear, eyear, val)]
# id x.fyear byear eyear val
# 1: 1 1998 1995 2000 1
# 2: 2 1998 1995 2000 1
# 3: 3 1998 1995 2000 1
# 4: 5 1998 1995 2000 1
# 5: 8 1998 1995 2000 1
# 6: 13 1998 1995 2000 1
# 7: 1 1999 1995 2000 1
# 8: 2 1999 1995 2000 1
# 9: 3 1999 1995 2000 1
#10: 5 1999 1995 2000 1
#11: 8 1999 1995 2000 1
#12: 13 1999 1995 2000 1
#13: 1 2000 2000 2005 5
#14: 2 2000 2000 2005 5
#15: 3 2000 2000 2005 5
#16: 5 2000 2000 2005 5
#17: 8 2000 2000 2005 5
#18: 13 2000 2000 2005 5
#19: 1 2001 2000 2005 5
#20: 2 2001 2000 2005 5
#21: 3 2001 2000 2005 5
#22: 5 2001 2000 2005 5
#23: 8 2001 2000 2005 5
#24: 2 2002 2000 2005 5
#25: 3 2002 2000 2005 5
#26: 2 2003 2000 2005 5
#27: 3 2003 2000 2005 5
# id x.fyear byear eyear val
You can also get this to work with foverlaps
in 1.9.6 with a little more effort.