SAS data step/ proc sql insert rows from another table with auto increment primary key

Luke Henz picture Luke Henz · Mar 21, 2013 · Viewed 13.2k times · Source

I have 2 datasets as below

id name status 
1  A    a
2  B    b
3  C    c

Another dataset

name status new
C    c      0
D    d      1
E    e      1
F    f      1

How do I insert all rows from 2nd table to 1st table? The situation is that the first table is permanent. The 2nd table is updated monthly, so I would like to add all rows from the monthly updated table to the permanent table, so that it would look like this

id name status
1  A    a
2  B    b
3  C    c
4  D    d
5  E    e
6  F    f

The problem I'm facing is that I cannot increment the id from dataset 1. As far as I searched, the dataset in SAS does not have auto increment property. The auto increment can be done with using data step, but I don't know if data step could be use in the case with 2 tables like this. The usual sql would be

Insert into table1 (name, status) 
select name, status from table2 where new = 1;

But since the sas dataset not support auto increment column hence the problem I'm facing. I could solve it by using SAS data step as below after the above proc sql

data table1;
set table1;
if _n_ > 3 then id = _n_;
run;

This would increase the value of id column, but the code is kinda ugly, and also the id is a primary key, and being used as a foreign key in other table, so I don't want to mess up the ids of old rows.

I'm in the process of both learning and working with SAS so help is really appreciated. Thanks in advance.

Extra question: If the 2nd table does not have the new column, is there any way to complete what I want (add new row from monthly table (2nd) to permanent table (1st)) with data step? Currently, I use this ugly proc sql/data step to create new column

proc sql; //create a temp table from table2
create t2temp as select t2.*, 
(case when t2.name = t1.name and t2.status = t1.status then 0 else 1) as new
from table2 as t2 
left join table1 as t1
on t2.name = t1.name and t2.status = t1.status;
drop table t2; //drop the old table2 with no column "new"
quit;
data table2;  //rename the t2temp as table2
set t2temp;
run;

Answer

Joe picture Joe · Mar 21, 2013

You can do it in the datastep. BTW, if you were creating it entirely anew, you could just use

id+1;

to create an autonumbered field (assuming your data step wasn't too complicated). This will keep track of the current highest ID number and assign one higher to each row as you go if it is in the new dataset.

data have;
input id name $ status $;
datalines;
2  A    a
3  B    b
1  C    c
;;;;
run;

data addon;
input name $ status $ new;
datalines;
C    c      0
D    d      1
E    e      1
F    f      1
;;;;
run;

data want;
retain _maxID;                    *keep the value of _maxID from one row to the next, 
                                   do not reset it;
set have(in=old) addon(in=add);   *in= creates a temporary variable indicating which 
                                   dataset a row came from;
if (old) or (add and new);        *in SAS like in c/etc., 0/missing(null) is 
                                   false negative/positive numbers are true;
if add then ID = _maxID+1;        *assigns ID to the new records;
_maxID = max(id,_maxID);          *determines the new maximum ID - 
                                   this structure guarantees it works 
                                   even if the old DS is not sorted;
put id= name=;
drop _maxID;
run;

Response to second question:

Yes, you can still do that. One of the easiest ways is, if you have the datasets sorted by NAME:

data want;
retain _maxID;
set have(in=old) addon(in=add);
by name;
if (old) or (add and first.name);
if add then ID = _maxID+1;
_maxID = max(id,_maxID);
put id= name=;
run;

first.name will be true for the first record with the same value of name; so if HAVE has a value of that name, then ADDON will not be permitted to add a new record.

This does require name to be unique in HAVE, or you might delete some records. If that is not true then you have a more complicated solution.