I've used lex and yacc (more usually bison) in the past for various projects, usually translators (such as a subset of EDIF streamed into an EDA app). Additionally, I've had to support code based on lex/yacc grammars dating back decades. So I know my way around the tools, though I'm no expert.
I've seen positive comments about Antlr in various fora in the past, and I'm curious as to what I may be missing. So if you've used both, please tell me what's better or more advanced in Antlr. My current constraints are that I work in a C++ shop, and any product we ship will not include Java, so the resulting parsers would have to follow that rule.
One major difference is that ANTLR generates an LL(*) parser, whereas YACC and Bison both generate parsers that are LALR. This is an important distinction for a number of applications, the most obvious being operators:
expr ::= expr '+' expr
| expr '-' expr
| '(' expr ')'
| NUM ;
ANTLR is entirely incapable of handling this grammar as-is. To use ANTLR (or any other LL parser generator), you would need to convert this grammar to something that is not left-recursive. However, Bison has no problem with grammars of this form. You would need to declare '+' and '-' as left-associative operators, but that is not strictly required for left recursion. A better example might be dispatch:
expr ::= expr '.' ID '(' actuals ')' ;
actuals ::= actuals ',' expr | expr ;
Notice that both the expr
and the actuals
rules are left-recursive. This produces a much more efficient AST when it comes time for code generation because it avoids the need for multiple registers and unnecessary spilling (a left-leaning tree can be collapsed whereas a right-leaning tree cannot).
In terms of personal taste, I think that LALR grammars are a lot easier to construct and debug. The downside is you have to deal with somewhat cryptic errors like shift-reduce and (the dreaded) reduce-reduce. These are errors that Bison catches when generating the parser, so it doesn't affect the end-user experience, but it can make the development process a bit more interesting. ANTLR is generally considered to be easier to use than YACC/Bison for precisely this reason.