I am trying to use PROC SQL to query a DB2 table with hundreds of millions of records. During the development stage, I want to run my query on an arbitrarily small subset of those records (say, 1000). I've tried using INOBS to limit the observations, but I believe that this parameter is simply limiting the number of records which SAS is processing. I want SAS to only fetch an arbitrary number of records from the database (and then process all of them).
If I were writing a SQL query myself, I would simply use SELECT * FROM x FETCH FIRST 1000 ROWS ONLY ...
(the equivalent of SELECT TOP 1000 * FROM x
in SQL Server). But PROC SQL doesn't seem to have any option like this. It's taking an extremely long time to fetch the records.
The question: How can I instruct SAS to arbitrarily limit the number of records to return from the database.
I've read that PROC SQL uses ANSI SQL, which doesn't have any specification for a row limiting keyword. Perhaps SAS didn't feel like making the effort to translate its SQL syntax to vendor-specific keywords? Is there no work around?
Have you tried using the outobs
option in your proc sql
?
For example,
proc sql outobs=10; create table test
as
select * from schema.HUGE_TABLE
order by n;
quit;
Alternatively, you can use SQL passthrough to write a query using DB2 syntax (FETCH FIRST 10 ROWS ONLY
), although this requires you to store all your data in the database, at least temporarily.
Passthrough looks something like this:
proc sql;
connect to db2 (user=&userid. password=&userpw. database=MY_DB);
create table test as
select * from connection to db2 (
select * from schema.HUGE_TABLE
order by n
FETCH FIRST 10 ROWS ONLY
);
quit;
It requires more syntax and can't access your sas datasets, so if outobs
works for you, I would recommend that.