SQLDF Left Join in R

Bryan picture Bryan · May 30, 2013 · Viewed 17.6k times · Source

My goal is to take 'matr', sort it by column c1, and keep unique(c1) where c2 = 1. For example, from this code...

c1 = c("a",'a','a','b','b','b','b','c','c')
c2 = c(1,1,0,1,1,0,1,0,0)
matr = as.data.frame(cbind(c1,c2))    
one = sqldf('select distinct(c1),c2 from matr where c2 = 1')    
vs = sqldf('select distinct(c1),c0,c2 from matr group by c1')
sqldf('select matr.*,one.* from matr 
  left outer join one 
  where one.c1 = matr.c1')

arrive at:

c1 c2
a  1
b  1
c  0

For some reason, i lose row c in the left join. I can only arrive at (through other joins)

c1 c2
a  1
b  1

Answer

G. Grothendieck picture G. Grothendieck · May 31, 2013

You want on rather than where in the last sql statement. With where it is performing the join and then afterwards applies the where clause whereas with on it does the join relative to the on condition.

> sqldf('select matr.*, one.* from matr left outer join one on one.c1 = matr.c1')
  c1 c2   c1   c2
1  a  1    a    1
2  a  1    a    1
3  a  0    a    1
4  b  1    b    1
5  b  1    b    1
6  b  0    b    1
7  b  1    b    1
8  c  0 <NA> <NA>
9  c  0 <NA> <NA>