I am having a big problem to write a regexp that will trim all the whitespace in my input.
I have tried \s+
and [ \t\t\r]+
but that don't work.
I need this because I am writing a scanner using flex, and I am stuck at matching whitespace. The whitespace should just be matched and not removed.
Example input:
program
3.3 5 7
{ comment }
string
panic: cant happen
flex
uses (approximately) the POSIX "Extended Regular Expression" syntax -- \s
doesn't work, because it's a Perl extension.
Is [ \t\t\r]+
a typo? I think you'll want a \n
in there.
Something like [ \n\t\r]+
certainly should work. For example, this lexer (which I've saved as lexer.l
):
%{
#include <stdio.h>
%}
%option noyywrap
%%
[ \n\t\r]+ { printf("Whitespace: '%s'\n", yytext); }
[^ \n\t\r]+ { printf("Non-whitespace: '%s'\n", yytext); }
%%
int main(void)
{
yylex();
return 0;
}
...successfully matches the whitespace in your example input (which I've saved as input.txt
):
$ flex lexer.l
$ gcc -o test lex.yy.c
$ ./test < input.txt
Non-whitespace: 'program'
Whitespace: '
'
Non-whitespace: '3.3'
Whitespace: ' '
Non-whitespace: '5'
Whitespace: ' '
Non-whitespace: '7'
Whitespace: '
'
Non-whitespace: '{'
Whitespace: ' '
Non-whitespace: 'comment'
Whitespace: ' '
Non-whitespace: '}'
Whitespace: '
'
Non-whitespace: 'string'
Whitespace: '
'
Non-whitespace: 'panic:'
Whitespace: ' '
Non-whitespace: 'cant'
Whitespace: ' '
Non-whitespace: 'happen'
Whitespace: '
'