Regex a decimal number with comma

LiranBo picture LiranBo · Nov 22, 2013 · Viewed 13.1k times · Source

I'm heaving trouble finding the right regex for decimal numbers which include the comma separator.

I did find a few other questions regarding this issue in general but none of the answers really worked when I tested them

The best I got so far is:

[0-9]{1,3}(,([0-9]{3}))*(.[0-9]+)?

2 main problems so far:

1) It records numbers with spaces between them "3001 1" instead of splitting them to 2 matches "3001" "1" - I don't really see where I allowed space in the regex.

2) I have a general problem with the beginning\ending of the regex.

The regex should match:

3,001
1
32,012,111.2131 

But not:

32,012,11.2131
1132,012,111.2131
32,0112,111.2131
32131

In addition I'd like it to match:

1.(without any number after it)
1,(without any number after it)
as 1

(a comma or point at the end of the number should be overlooked).

Many Thanks! .

Answer

OGHaza picture OGHaza · Nov 22, 2013

This is a very long and convoluted regular expression that fits all your requirements. It will work if your regex engine is based on PCRE (hopefully you're using PHP, Delphi or R..).

(?<=[^\d,.]|^)\d{1,3}(,(\d{3}))*((?=[,.](\s|$))|(\.\d+)?(?=[^\d,.]|$))

DEMO on RegExr

The things that make it so long:

  1. Matching multiple numbers on the same line separated by only 1 character (a space) whilst not allowing partial matchs requires a lookahead and a lookbehind.
  2. Matching numbers ending with . and , without including the . or , in the match requires another lookahead.

(?=[,.](\s|$)) Explanation

When writing this explanation I realised the \s needs to be a (\s|$) to match 1, at the very end of a string.

This part of the regex is for matching the 1 in 1, or the 1,000 in 1,000. so let's say our number is 1,000. (with the . on the end).

Up to this point the regex has matched 1,000, then it can't find another , to repeat the thousands group so it moves on to our (?=[,.](\s|$))

(?=....) means its a lookahead, that means from where we have matched up to, look at whats coming but don't add it to the match.

So It checks if there is a , or a . and if there is, it checks that it's immediately followed by whitespace or the end of input. In this case it is, so it'd leave the match as 1,000

Had the lookahead not matched, it would have moved on to trying to match decimal places.