I have seen many ANTLR grammars that use whitespace handling like this:
WS: [ \n\t\r]+ -> skip;
// or
WS: [ \n\t\r]+ -> channel(HIDDEN);
So the whitespaces are thrown away respectively send to the hidden channel.
With a grammar like this:
grammar Not;
start: expression;
expression: NOT expression
| (TRUE | FALSE);
NOT: 'not';
TRUE: 'true';
FALSE: 'false';
WS: [ \n\t\r]+ -> skip;
valid inputs are 'not true' or 'not false' but also 'nottrue' which is not a desired result. Changing the grammar to:
grammar Not;
start: expression;
expression: NOT WS+ expression
| (TRUE | FALSE);
NOT: 'not';
TRUE: 'true';
FALSE: 'false';
WS: [ \n\t\r];
fixes the problem, but i do not want to handle the whitespaces manually in each rule.
Generally i want to have a whitespace between each token with some exceptions (e.g. '!true' does not need a whitespace in between).
Is there a simple way of doing this?
Add an IDENTIFIER
lexer rule to handle words which are not keywords.
IDENTIFIER : [a-zA-Z]+;
Now the text nottrue
is a single IDENTIFIER
token which your parser would not accept in place of the distinct keywords in not true
.
Make sure IDENTIFIER
is defined after your other keywords. The lexer will find that both NOT
and IDENTIFIER
match the text not
, and will assign the token type to the first one that appears in the grammar.