I'm heaving trouble finding the right regex for decimal numbers which include the comma separator.
I did find a few other questions regarding this issue in general but none of the answers really worked when I tested them
The best I got so far is:
[0-9]{1,3}(,([0-9]{3}))*(.[0-9]+)?
2 main problems so far:
1) It records numbers with spaces between them "3001 1" instead of splitting them to 2 matches "3001" "1" - I don't really see where I allowed space in the regex.
2) I have a general problem with the beginning\ending of the regex.
The regex should match:
3,001
1
32,012,111.2131
But not:
32,012,11.2131
1132,012,111.2131
32,0112,111.2131
32131
In addition I'd like it to match:
1.(without any number after it)
1,(without any number after it)
as 1
(a comma or point at the end of the number should be overlooked).
Many Thanks! .
This is a very long and convoluted regular expression that fits all your requirements. It will work if your regex engine is based on PCRE (hopefully you're using PHP, Delphi or R..).
(?<=[^\d,.]|^)\d{1,3}(,(\d{3}))*((?=[,.](\s|$))|(\.\d+)?(?=[^\d,.]|$))
The things that make it so long:
.
and ,
without including the .
or ,
in the match requires another lookahead.(?=[,.](\s|$))
Explanation
When writing this explanation I realised the \s
needs to be a (\s|$)
to match 1,
at the very end of a string.
This part of the regex is for matching the 1
in 1,
or the 1,000
in 1,000.
so let's say our number is 1,000.
(with the .
on the end).
Up to this point the regex has matched 1,000
, then it can't find another ,
to repeat the thousands group so it moves on to our (?=[,.](\s|$))
(?=....)
means its a lookahead, that means from where we have matched up to, look at whats coming but don't add it to the match.
So It checks if there is a ,
or a .
and if there is, it checks that it's immediately followed by whitespace or the end of input. In this case it is, so it'd leave the match as 1,000
Had the lookahead not matched, it would have moved on to trying to match decimal places.