How to use multiple arguments for awk with a shebang (i.e. #!)?

Hans-Peter Störr picture Hans-Peter Störr · Nov 29, 2010 · Viewed 38.4k times · Source

I'd like to execute an gawk script with --re-interval using a shebang. The "naive" approach of

#!/usr/bin/gawk --re-interval -f
... awk script goes here

does not work, since gawk is called with the first argument "--re-interval -f" (not splitted around the whitespace), which it does not understand. Is there a workaround for that?

Of course you can either not call gawk directly but wrap it into a shell script that splits the first argument, or make a shell script that then calls gawk and put the script into another file, but I was wondering if there was some way to do this within one file.

The behaviour of shebang lines differs from system to system - at least in Cygwin it does not split the arguments by whitespaces. I just care about how to do it on a system that behaves like that; the script is not meant to be portable.

Answer

Jörg W Mittag picture Jörg W Mittag · Nov 29, 2010

The shebang line has never been specified as part of POSIX, SUS, LSB or any other specification. AFAIK, it hasn't even been properly documented.

There is a rough consensus about what it does: take everything between the ! and the \n and exec it. The assumption is that everything between the ! and the \n is a full absolute path to the interpreter. There is no consensus about what happens if it contains whitespace.

  1. Some operating systems simply treat the entire thing as the path. After all, in most operating systems, whitespace or dashes are legal in a path.
  2. Some operating systems split at whitespace and treat the first part as the path to the interpreter and the rest as individual arguments.
  3. Some operating systems split at the first whitespace and treat the front part as the path to the interpeter and the rest as a single argument (which is what you are seeing).
  4. Some even don't support shebang lines at all.

Thankfully, 1. and 4. seem to have died out, but 3. is pretty widespread, so you simply cannot rely on being able to pass more than one argument.

And since the location of commands is also not specified in POSIX or SUS, you generally use up that single argument by passing the executable's name to env so that it can determine the executable's location; e.g.:

#!/usr/bin/env gawk

[Obviously, this still assumes a particular path for env, but there are only very few systems where it lives in /bin, so this is generally safe. The location of env is a lot more standardized than the location of gawk or even worse something like python or ruby or spidermonkey.]

Which means that you cannot actually use any arguments at all.