When writing a ("theoretical") grammar with a rule with an empty right-hand side, one always use a symbol such as ε (or 1) to make this emptiness explicit:
A → ε | a A
Such a grammar in Yacc and others would then look like
a: | 'a' a
or "worse"
a: { $$ = new_list(); }
| a 'a' { $$ = $1; $$->append($1); }
;
The fact that in "real world grammars" (Yacc, Bison, etc.) this empty right-hand side part of the rule is not explicitly marked as empty troubles me: it is easy to miss the fact that an rhs is empty, or worse: to forget to insert |
and actually use a mid-rule action:
a: { $$ = new_list(); }
a 'a' { $$ = $1; $$->append($1); }
;
1) I don't know of any tool that provides a means to make empty rhs explicit. Are there any?
Future versions of Bison might support a dedicated symbol, with errors when used in a non-empty rhs, and warnings when a implicitly empty rhs is left.
2) Do people consider this useful?
3) What would be the notation you'd suggest?
Currently, the candidate is $empty
:
a: $empty { $$ = new_list(); }
| a 'a' { $$ = $1; $$->append($1); }
;
The chosen syntax is %empty
:
a: %empty { $$ = new_list(); }
| a 'a' { $$ = $1; $$->append($1); }
;
Indeed $empty
looks like a pseudo-symbol, such as $accept
that Bison generates for the initial rule, or the $@n
pseudo-symbols for mid-rule actions, or $eof
for, well, end-of-file. But it's definitely not a symbol, it is precisely the absence of symbols.
On the other hand %
clearly denotes a directive (some kind of attribute/metadata), like %pred
.
So it's a minor difference of syntax, but it's more consistent with the overall syntax. Credit goes to Joel E. Denny.
I usually just use a comment:
a: /*epsilon*/ { $$ = new_list(); }
| a 'a' { $$ = $1; $$->append($1); }
;
Works fine with no changes and makes the intent clear....
IMO, this comes under the heading "If it ain't broke, don't fix it"