how to do a full outer join that will populate with their joing criteria regardless if they don't match up

Mike L picture Mike L · Jul 16, 2013 · Viewed 7.3k times · Source

I have two tables I'm doing a full outer join on. They are joined on variable "accsnnum", which they both share. But I'm having an issue. The accsnnum only prints out when it is present in both sets, but I need it to print it out regardless of if its in both because I'm comparing counts of a variable grouped by accsnum.

To illustrate, I have a labtable that looks like this (there are only these 2 variables):

accsnum   lab_lbcount
  A12         3
  A11         5
  A14         7
  A13         5

And a rslttable that looks like this (there are only these 2 variables):

accsnum    rslt_lbcount
  A15          5
  A12          2
  A14          7
  A16          3

I combine these two using the following code:

proc sql;
create table both as
select Labtable.accsnnum label="Lab_accsnnum", Rslttable.accsnnum 
  label="Rslt_accsnnum", Labtable.Lab_lbcount label="Lab_lbtestcd_count", 
  Rslttable.Rslt_lbcount label="Rslt_lbtestcd_count", 
  Lab_lbcount-Rslt_lbcount as difference
from work.Labtable
full outer join work.Rslttable 
on Labtable.accsnnum = Rslttable.accsnnum;
quit;

This produces the following table:

accsnum   lab_lbcount   rslt_lbcount   difference
               5             .             .
  A12          3             2             1
               5             .             .
  A14          7             7             0
               .             5             .
               .             3             .

Note that it only prints out the accsnum for where the two tables join, but includes counts for "lb_count" and "rslt_count". I need to either have it print out the accsnum in that variable even if its not in both, or I need to figure a way to join by accsnum but still print out the accsum from each of the two tables. Basically, I want the output to look something like this:

accsnum   lab_lbcount   rslt_lbcount    difference
  A11          5             .             .
  A12          3             2             1
  A13          5             .             .
  A14          7             7             0
  A15          .             5             .
  A16          .             3             .     

Or even something like this would be fine:

accsnum    lab_accsum     rslt_accsum   lab_lbcount   rslt_lbcount  difference
              A11            .               5             .             .
  A12         A12           A12              3             2             1
              A13            .               5             .             .
  A14         A14           A14              7             7             0
               .            A15              .             5             .
               .            A16              .             3             .

Any help is greatly appreciated.

Edit: I'm doing SQL code inside SAS environment, in case that wasn't clear. But that should have little bearing on the code. Thanks again!

Answer

Joe picture Joe · Jul 16, 2013

COALESCE the two accsnnums. This function takes the first nonmissing value, so if labtable.accsnnum is available it takes that; otherwise it takes rslttable.accsnnum.

proc sql;
create table both as
select coalesce(labtable.accsnnum,rslttable.accsnnum) as accsnnum label="Access Number", Labtable.Lab_lbcount label="Lab_lbtestcd_count", 
  Rslttable.Rslt_lbcount label="Rslt_lbtestcd_count", 
  Lab_lbcount-Rslt_lbcount as difference
from work.Labtable
full outer join work.Rslttable 
on Labtable.accsnnum = Rslttable.accsnnum;
quit;