Python regex matching Unicode properties

ThomasH picture ThomasH · Dec 2, 2009 · Viewed 16.2k times · Source

Perl and some other current regex engines support Unicode properties, such as the category, in a regex. E.g. in Perl you can use \p{Ll} to match an arbitrary lower-case letter, or p{Zs} for any space separator. I don't see support for this in either the 2.x nor 3.x lines of Python (with due regrets). Is anybody aware of a good strategy to get a similar effect? Homegrown solutions are welcome.

Answer

ronnix picture ronnix · Nov 30, 2010

The regex module (an alternative to the standard re module) supports Unicode codepoint properties with the \p{} syntax.