Match any unicode letter?

python regex character-properties

mpen · Jun 11, 2011 · Viewed 8.5k times · Source

In .net you can use \p{L} to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.

Answer

Python's re module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE flag, and then the character class shorthand \w will match Unicode letters, too.

Since \w will also match digits, you need to then subtract those from your character class, along with the underscore:

[^\W\d_]

will match any Unicode letter.

>>> import re
>>> r = re.compile(r'[^\W\d_]', re.U)
>>> r.match('x')
<_sre.SRE_Match object at 0x0000000001DBCF38>
>>> r.match(u'é')
<_sre.SRE_Match object at 0x0000000002253030>

Match any unicode letter?

Answer

Related questions