I have a task to write simple parser-generator, so I wrote ANTLR-like grammar and tried to parse simple file like "foo:bar;", but got the following output:
[@0,0:2='foo',<1>,1:0]
[@1,3:3=':',<16>,1:3]
[@2,4:6='bar',<1>,1:4]
[@3,7:7=';',<18>,1:7]
[@4,8:7='<EOF>',<-1>,1:8]
line 1:0 no viable alternative at input 'foo'
(rule foo : bar ;)
My grammar looks like
grammar parsGen;
gram : rule SEMICOLON (NEWLINE+ rule SEMICOLON)* ;
rule : lRule | pRule ;
lRule : LRULEID COLON lRule1 ;
lRule1 : (((LRULEID | STRING | SET) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE+)+ ;
pRule : PRULEID COLON pRule1 ;
pRule1 : (((LRULEID | PRULEID) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE+)+ ;
modificator : PLUS | ASTERISK | QUESTION ;
ID : LRULEID | PRULEID ;
LRULEID : UPPERLETTER (UPPERLETTER | LOWERLETTER | DIGIT)* ;
PRULEID : LOWERLETTER (UPPERLETTER | LOWERLETTER | DIGIT)* ;
STRING : ('\''.*?'\'') ;
SET : '\''.*?'\'..\''.*?'\'' ;
UPPERLETTER : [A-Z] ;
LOWERLETTER : [a-z] ;
DIGIT : [0-9] ;
NEWLINE : '\r\n'|'\n'|'\r' ;
PLUS : '+' ;
ASTERISK : '*' ;
QUESTION : '?' ;
LBRACE : '(' ;
RBRACE : ')' ;
SPACE : ' ' ;
COLON : ':' ;
PIPE : '|' ;
SEMICOLON : ';' ;
So where could I make a mistake? I tried to search everywhere (google, SO etc.) error "no viable alternative", but it didn't really help me.
ANTLR lexers fully assign unambiguous token types before the parser is ever used. When multiple token types can match a token, the first one appearing in the grammar is the one that is used. For your grammar, a token cannot have the type ID
and the type LRULEID
at the same time. Since the input foo
matches both of these lexer rules, the first appearing in the grammar is used so your tokens are: ID
, COLON
, ID
, SEMICOLON
, <EOF>
.
Since the ID
token is never actually referenced in the parser, I suggest one of the following changes. Either of these options will resolve the problem you have described, so the choice is entirely your preference for how the final grammar looks.
Foreword
You need to change the space references from SPACE+
to SPACE*
, or the rule will require at least one space character between bar
and ;
.
Option 1
Remove the ID
lexer rule altogether.
Option 2
Change ID
to a parser rule so it's not trying to assign token type ID
to all of your identifiers.
id : LRULEID | PRULEID;
Update pRule1
rule by referencing id
.
pRule1 : ((id | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE+)+ ;
Unrelated Side Note
You grammar might be easier to read if you remove the outermost +
closure inside the lRule
and pRule1
rules, and instead add them to the rule references themselves, like this. Note that I changed the SPACE
references as described in the foreword.
lRule : LRULEID COLON lRule1+ ;
lRule1 : ((LRULEID | STRING | SET) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE* ;
pRule : PRULEID COLON pRule1+ ;
pRule1 : ((LRULEID | PRULEID) | LBRACE lRule1 PIPE lRule1 RBRACE) modificator? SPACE* ;