ANTLR : no viable alternative error

Yaroslav Skudarnov picture Yaroslav Skudarnov · May 31, 2013 · Viewed 21.5k times · Source

I have a task to write simple parser-generator, so I wrote ANTLR-like grammar and tried to parse simple file like "foo:bar;", but got the following output:

[@0,0:2='foo',<1>,1:0]
[@1,3:3=':',<16>,1:3]
[@2,4:6='bar',<1>,1:4]
[@3,7:7=';',<18>,1:7]
[@4,8:7='<EOF>',<-1>,1:8]
line 1:0 no viable alternative at input 'foo'
(rule foo : bar ;)

My grammar looks like

grammar parsGen;

gram : rule SEMICOLON (NEWLINE+ rule SEMICOLON)* ;

rule : lRule | pRule ;

lRule : LRULEID COLON lRule1 ;
lRule1 : (((LRULEID | STRING | SET) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE+)+ ;

pRule : PRULEID COLON pRule1 ;
pRule1 : (((LRULEID | PRULEID) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE+)+ ;

modificator : PLUS | ASTERISK | QUESTION ;

ID : LRULEID | PRULEID ;

LRULEID : UPPERLETTER (UPPERLETTER | LOWERLETTER | DIGIT)* ;
PRULEID : LOWERLETTER (UPPERLETTER | LOWERLETTER | DIGIT)* ;

STRING : ('\''.*?'\'') ;
SET : '\''.*?'\'..\''.*?'\'' ;

UPPERLETTER : [A-Z] ;
LOWERLETTER : [a-z] ;
DIGIT : [0-9] ;

NEWLINE : '\r\n'|'\n'|'\r' ;

PLUS : '+' ;
ASTERISK : '*' ;
QUESTION : '?' ;

LBRACE : '(' ;
RBRACE : ')' ;

SPACE : ' ' ;

COLON : ':' ;

PIPE : '|' ;

SEMICOLON : ';' ;

So where could I make a mistake? I tried to search everywhere (google, SO etc.) error "no viable alternative", but it didn't really help me.

Answer

Sam Harwell picture Sam Harwell · May 31, 2013

ANTLR lexers fully assign unambiguous token types before the parser is ever used. When multiple token types can match a token, the first one appearing in the grammar is the one that is used. For your grammar, a token cannot have the type ID and the type LRULEID at the same time. Since the input foo matches both of these lexer rules, the first appearing in the grammar is used so your tokens are: ID, COLON, ID, SEMICOLON, <EOF>.

Since the ID token is never actually referenced in the parser, I suggest one of the following changes. Either of these options will resolve the problem you have described, so the choice is entirely your preference for how the final grammar looks.

Foreword

You need to change the space references from SPACE+ to SPACE*, or the rule will require at least one space character between bar and ;.

Option 1

Remove the ID lexer rule altogether.

Option 2

  1. Change ID to a parser rule so it's not trying to assign token type ID to all of your identifiers.

    id : LRULEID | PRULEID;
    
  2. Update pRule1 rule by referencing id.

    pRule1 : ((id | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE+)+ ;
    

Unrelated Side Note

You grammar might be easier to read if you remove the outermost + closure inside the lRule and pRule1 rules, and instead add them to the rule references themselves, like this. Note that I changed the SPACE references as described in the foreword.

lRule : LRULEID COLON lRule1+ ;
lRule1 : ((LRULEID | STRING | SET) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE* ;

pRule : PRULEID COLON pRule1+ ;
pRule1 : ((LRULEID | PRULEID) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE* ;