Given the following basic grammar I want to understand how I can handle comment lines. Missing is the handling of the <CR><LF>
which usually terminates the comment line - the only exception is a last comment line before the EOF, e. g.:
# comment
abcd := 12 ;
# comment eof without <CR><LF>
grammar CommentLine1a;
//==========================================================
// Options
//==========================================================
//==========================================================
// Lexer Rules
//==========================================================
Int
: Digit+
;
fragment Digit
: '0'..'9'
;
ID_NoDigitStart
: ( 'a'..'z' | 'A'..'Z' ) ('a'..'z' | 'A'..'Z' | Digit )*
;
Whitespace
: ( ' ' | '\t' | '\r' | '\n' )+ { $channel = HIDDEN ; }
;
//==========================================================
// Parser Rules
//==========================================================
code
: ( assignment | comment )+
;
assignment
: id_NoDigitStart ':=' id_DigitStart ';'
;
id_NoDigitStart
: ID_NoDigitStart
;
id_DigitStart
: ( ID_NoDigitStart | Int )+
;
comment
: '#' ~( '\r' | '\n' )*
;
Unless you have a very compelling reason to put the comment inside the parser (which I'd like to hear), you should put it in the lexer:
Comment
: '#' ~( '\r' | '\n' )*
;
And since you already account for line breaks in your Space
rule, there's no problem with input like # comment eof without <CR><LF>
Also, if you use literal tokens inside parser rules, ANTLR automatically creates lexer rules of them behind the scenes. So in your case:
comment
: '#' ~( '\r' | '\n' )*
;
would match a '#'
followed by zero or more tokens other than '\r'
and '\n'
and not zero or more characters other than '\r'
and '\n'
.
For future reference:
~
negates tokens.
matches any token~
negates characters.
matches any character in the range 0x0000
... 0xFFFF