Use regular expression to match ANY Chinese character in utf-8 encoding

xiaohan2012 picture xiaohan2012 · Mar 6, 2012 · Viewed 56.2k times · Source

For example, I want to match a string consisting of m to n Chinese characters, then I can use:

[single Chinese character regular expression]{m,n}

Is there some regular expression of a single Chinese character, which could be any Chinese characters that exists?

Answer

tchrist picture tchrist · Mar 6, 2012

The regex to match a Chinese (well, CJK) character is

\p{script=Han}

which can be appreviated to simply

\p{Han}

This assumes that your regex compiler meets requirement RL1.2 Properties from UTS#18 Unicode Regular Expressions. Perl and Java 7 both meet that spec, but many others do not.