When is EOF needed in ANTLR 4?

DaoWen picture DaoWen · Jul 24, 2013 · Viewed 13.7k times · Source

The TestDriver in ANTLRWorks2 seems kind of finicky about when it'll accept a grammer without and explicit EOF and when it will not. The Hello grammar in the ANTLR4 Getting Started Guide doesn't use EOF anywhere, so I inferred that it's better to avoid explicit EOF if possible.

What is the best practice for using EOF? When do you actually need it?

Answer

Sam Harwell picture Sam Harwell · Jul 25, 2013

You should include an explicit EOF at the end of your entry rule any time you are trying to parse an entire input file. If you do not include the EOF, it means you are not trying to parse the entire input, and it's acceptable to parse only a portion of the input if it means avoiding a syntax error.

For example, consider the following rule:

file : item*;

This rule means "Parse as many item elements as possible, and then stop." In other words, this rule will never attempt to recover from a syntax error because it will always assume that the syntax error is part of some syntactic construct that's beyond the scope of the file rule. Syntax errors will not even be reported, because the parser will simply stop.

If instead I had the following rule:

file : item* EOF;

In means "A file consists exactly of a sequence of zero-or-more item elements." If a syntax error is reached while parsing an item element, this rule will attempt to recover from (and report) the syntax error and continue because the EOF is required and has not yet been reached.


For rules where you are only trying to parse a portion of the input, ANTLR 4 often works, but not always. The following issue describes a technical problem where ANTLR 4 does not always make the correct decision if the EOF is omitted.

https://github.com/antlr/antlr4/issues/118

Unfortunately the performance impact of this change is substantial, so until that is resolved there will be edge cases that do not behave as you expect.