Goal: Given a number (it may be very long and it is greater than 0), I'd like to get the five least meaningful digits dropping any 0 at the end of that number.
I tried to solve this with regex, Helped by RegexBuddy I came to this one:
[\d]+([\d]{0,4}+[1-9])0*
But python can't compile that.
>>> import re
>>> re.compile(r"[\d]+([\d]{0,4}+[1-9])0*")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/re.py", line 188, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.5/re.py", line 241, in _compile
raise error, v # invalid expression
sre_constants.error: multiple repeat
The problem is the "+" after "{0,4}", it seems it doesn't work in python (even in 2.6)
How can I write a working regex?
PS:
I know you can start dividing by 10 and then using the remainder n%100000... but this is a problem about regex.
That regular expression is very superfluous. Try this:
>>> import re
>>> re.compile(r"(\d{0,4}[1-9])0*$")
The above regular expression assumes that the number is valid (it will also match "abc0123450", for example.) If you really need the validation that there are no non-number characters, you may use this:
>>> import re
>>> re.compile(r"^\d*?(\d{0,4}[1-9])0*$")
Anyways, the \d
does not need to be in a character class, and the quantifier {0,4}
does not need to be forced to be greedy (as the additional +
specifies, although apparently Python does not recognize that.)
Also, in the second regular expression, the \d
is non-greedy, as I believe this will improve the performance and accuracy. I also made it "zero or more" as I assume that is what you want.
I also added anchors as this ensures that your regular expression won't match anything in the middle of a string. If this is what you desired though (maybe you're scanning a long text?), remove the anchors.