Dynamically call macro from sas data step

JustinJDavies picture JustinJDavies · Aug 28, 2014 · Viewed 8.5k times · Source

This code executes fine when Run as a SAS program:

%MyMacro(foo_val, bar_val, bat_val);

I have created a table using:

DATA analyses;
   input title : $32. weight : $32. response : $32.;
   datalines;
foo1 bar1 bat1
foo2 bar2 bat2
;

I want to execute MyMacro once for each row of the analyses table.

The following code appears to only pass the string values title, weight and response (rather than the data values foo1 etc.) to my macro (tested with calls to the %put command) :

DATA _NULL_ ;
    set analyses;
    %MyMacro(title, weight, response);

RUN;

How can I invoke the macro once per record of the analyses table whilst passing data values as arguments to the macro? The intention is to actually run this for a very large number of analyses so the solution must scale appropriately to many more records in the analyses table.

Answer

Joe picture Joe · Aug 28, 2014

This in part depends on what your macro is doing. If we assume that your macro is doing something that is intended to be run outside of a data step (ie, it's not just assigning a data step variable), then you have several options.

CALL EXECUTE has already been explained, and is a good option for some cases. It has some downsides, however, particularly with macro timing, that requires some extra care to protect in some cases - particularly when you are creating macro variables inside your macro. Quentin in his comments shows a way to get around this (adding %NRSTR to the call), but I find that I prefer to only use CALL EXECUTE when there's an advantage to doing so over the other methods - particularly, if I want to use SAS data step techniques (such as FIRST or LAST, for example, or some form of looping) in creating my macro calls, or when I have to do things in a data step anyway and can avoid the overhead of reading the file another time. If I'm just writing a data step like yours above - data something, set something, call execute, run - I wouldn't use it.


PROC SQL SELECT INTO is typically what I use for list processing (which is largely what this is). I like SQL's simplicity a bit better when doing things that aren't too complicated; for example, you can get just one version of each macro call easily with DISTINCT without having to explicitly write a proc sort nodupkey or use first/last processing. It also has the advantage for debugging that you can write all of your macro calls to your results window (if you don't add noprint), which is a bit easier to read than the log for me if I'm trying to see why my calls didn't get generated properly (and doesn't take any extra PUT statements).

proc sql;
  select catx(',','%macro(',arg1,arg2,arg3)||')' 
    into :mvarlist separated by ' '
    from dataset;
quit;

&mvarlist.

That runs them quite simply, and has no timing issues (As you're just writing a bunch of macro calls out).

The main downside to this method is that you have a maximum of 64k characters in a macro variable, so if you're writing a huge number of these you'll run into that. In that case use CALL EXECUTE or %INCLUDE files.


%INCLUDE files are largely useful either as replacement for SELECT INTO when the call is over the character limit, or if you find it useful to have a text file to look at with your calls (if you're running this in batch mode for example, this could be easier to get to and/or parse than log or listing output). You just write your calls out to a file, and then %INCLUDE that file.

filename myfile temp; *or a real file if you want to look at it.;
data _null_;
 set dataset;
 file myfile;
 length str $200;
 str=catx(',','%macro(',arg1,arg2,arg3)||')';
 put str;
run;

%include myfile;

I don't really use this much anymore, but it's a common technique used particularly by older SAS programmers so good to know.


DOSUBL is a relatively new method, and to some extent can be used to replace CALL EXECUTE as its default behavior is typically closer to what you expect intuitively than CALL EXECUTE's. The doc page has really the best example for how this works differently; basically, it fixes the timing issue by letting each separate call look import and export the macro variables from/to the calling environment, meaning that each iteration of DOSUBL is run at a distinct time versus CALL EXECUTE where everything is run in one bunch and the macro environment is 'fixed' (ie, any reference to a macro variable is fixed at run time, unless you escape it messily with %NRSTR).


One more thing worth mentioning is RUN_MACRO, a part of the FCMP language. That allows you to completely run a macro and import its contents back to the data step, which is an interesting option in some cases (for example, you could wrap a call around a PROC SQL that selected a count of something, and then import that to the dataset as a variable, all in one datastep). It's applicable if you're doing this for the purpose of calling a macro to assign a data step variable, not to run a process that does things that don't need to be imported into the data step, but it's something worth considering if you do want that data back all in the dataset that called the process.