Advantages of Antlr (versus say, lex/yacc/bison)

Don Wakefield picture Don Wakefield · Oct 17, 2008 · Viewed 42.2k times · Source

I've used lex and yacc (more usually bison) in the past for various projects, usually translators (such as a subset of EDIF streamed into an EDA app). Additionally, I've had to support code based on lex/yacc grammars dating back decades. So I know my way around the tools, though I'm no expert.

I've seen positive comments about Antlr in various fora in the past, and I'm curious as to what I may be missing. So if you've used both, please tell me what's better or more advanced in Antlr. My current constraints are that I work in a C++ shop, and any product we ship will not include Java, so the resulting parsers would have to follow that rule.

Answer

Daniel Spiewak picture Daniel Spiewak · Oct 17, 2008

Update/warning: This answer may be out of date!


One major difference is that ANTLR generates an LL(*) parser, whereas YACC and Bison both generate parsers that are LALR. This is an important distinction for a number of applications, the most obvious being operators:

expr ::= expr '+' expr
       | expr '-' expr
       | '(' expr ')'
       | NUM ;

ANTLR is entirely incapable of handling this grammar as-is. To use ANTLR (or any other LL parser generator), you would need to convert this grammar to something that is not left-recursive. However, Bison has no problem with grammars of this form. You would need to declare '+' and '-' as left-associative operators, but that is not strictly required for left recursion. A better example might be dispatch:

expr ::= expr '.' ID '(' actuals ')' ;

actuals ::= actuals ',' expr | expr ;

Notice that both the expr and the actuals rules are left-recursive. This produces a much more efficient AST when it comes time for code generation because it avoids the need for multiple registers and unnecessary spilling (a left-leaning tree can be collapsed whereas a right-leaning tree cannot).

In terms of personal taste, I think that LALR grammars are a lot easier to construct and debug. The downside is you have to deal with somewhat cryptic errors like shift-reduce and (the dreaded) reduce-reduce. These are errors that Bison catches when generating the parser, so it doesn't affect the end-user experience, but it can make the development process a bit more interesting. ANTLR is generally considered to be easier to use than YACC/Bison for precisely this reason.