How to sort data using Data step in SAS

Saran picture Saran · Dec 18, 2017 · Viewed 8.1k times · Source

I want to sort data in SAS data step. What exactly I mean is: the work of proc sort should be done in data step. Is there any solution?

Answer

Stu Sztukowski picture Stu Sztukowski · Dec 18, 2017

If you're looking for a data-step only solution, you can do the work of PROC SORT with a hash table. The caveat is that you need enough memory to do it.

If you want to do a simple sort, you would load the hash table with the ordered:'yes' option and output it to a new table. By default, ordered:yes will sort the data in ascending order. You can specify descending as well.

Simple Sort

data _null_;

    /* Sets up PDV without loading the table */
    if(0) then set sashelp.class;

    /* Load sashelp.class into memory ordered by Height. Do not remove duplicates. */
    dcl hash sortit(dataset:'sashelp.class', ordered:'yes', multidata:'yes');

        sortit.defineKey('Height');     * Order by height;
        sortit.defineData(all:'yes');   * Keep all variables in the output dataset;

    sortit.defineDone();

    /* Output to a dataset called class_sorted */
    sortit.Output(dataset:'class_sorted');
run;

De-duping

To remove duplicates, do the exact same operation, except remove the multidata option. In the below table, observations (8, 9) and (15, 16) are duplicates of each other. Observations 9 and 16 will be eliminated.

data _null_;

    /* Sets up PDV without loading the table */
    if(0) then set sashelp.class;

    /* Load sashelp.class into memory ordered by Height. Do not keep duplicates. */
    dcl hash sortit(dataset:'sashelp.class', ordered:'yes');

        sortit.defineKey('Height');     * Order by height;
        sortit.defineData(all:'yes');   * Keep all variables in the output dataset;
    sortit.defineDone();

    /* Output to a dataset called class_sorted */
    sortit.Output(dataset:'class_sorted');
run;