What is the equivalent for epsilon in ANTLR BNF grammar notation?

Amirh picture Amirh · Apr 2, 2011 · Viewed 10.1k times · Source

During taking advantage of ANTLR 3.3, I'm changing the current grammar to support inputs without parenthesis too. Here's the first version of my grammar :

grammar PropLogic;

        NOT : '!' ;
        OR  : '+' ;
        AND : '.' ;
        IMPLIES : '->' ;
        SYMBOLS : ('a'..'z') | '~' ;
        OP : '(' ;
        CP : ')' ;

    prog    : formula EOF ;


    formula : NOT formula
        | OP formula( AND formula CP | OR formula CP | IMPLIES formula CP)
        | SYMBOLS ;


    WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+    { $channel = HIDDEN; } ;

Then I changed it this way to support the appropriate features :

grammar PropLogic;

    NOT : '!' ;
    OR  : '+' ;
    AND : '.' ;
    IMPLIES : '->' ;
    SYMBOL : ('a'..'z') | '~' ;
    OP : '(' ;
    CP : ')' ;
    EM : '' ;

prog    : formula EOF ;


formula : OP formula( AND formula CP | OR formula CP | IMPLIES formula CP)
    | ( NOT formula | SYMBOL )( AND formula | OR formula | IMPLIES formula | EM ) ;


WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+    { $channel = HIDDEN; } ;

But I've been faced with following error :

error<100>:  syntax error: invalid char literal: ''
error<100>:  syntax error: invalid char literal: ''

Does anybody know that how can I overcome this error?

Answer

Bart Kiers picture Bart Kiers · Apr 2, 2011

Your EM token:

EM : '' ;

is invalid: you can't match an empty string in lexer rules.

To match epsilon (nothing), you should do:

rule 
  :  A 
  |  B 
  |  /* epsilon */ 
  ;

Of course, the comment /* epsilon */ can safely be removed.

Note that when you do it like that in your current grammar, ANTLR will complain that there can be rules matched using multiple alternatives. This is because your grammar is ambiguous.