python isdigit() function return true for non digit character u'\u2466'

lxyu picture lxyu · May 15, 2012 · Viewed 24.9k times · Source

I come across a strange problem dealing with python isdigit function.

For example:

>>> a = u'\u2466'
>>> a.isdigit()
Out[1]: True
>>> a.isnumeric()
Out[2]: True

Why this character is a digit?

Any way to make this return False instead, thanks?


Edit, If I don't want to treat it as a digit, then how to filter it out?

For example, when I try to convert it to a int:

>>> int(u'\u2466')

Then UnicodeEncodeError happened.

Answer

NPE picture NPE · May 15, 2012

U+2466 is the CIRCLED DIGIT SEVEN (⑦), so yes, it's a digit.

If your definition of what is a digit differs from that of the Unicode Consortium, you might have to write your own isdigit() method.

Edit, If I don't want to treat it as a digit, then how to filter it out?

If you are just interested in the ASCII digits 0...9, you could do something like:

In [4]: s = u'abc 12434 \u2466 5 def'

In [5]: u''.join(c for c in s if '0' <= c <= '9')
Out[5]: u'124345'