To match A to Z, we will use regex:
[A-Za-z]
How to allow regex to match utf8 characters entered by user? For example Chinese words like 环保部
What you are looking for are Unicode properties.
e.g. \p{L}
is any kind of letter from any language
So a regex to match such a Chinese word could be something like
\p{L}+
There are many such properties, for more details see regular-expressions.info
Another option is to use the modifier
Pattern.UNICODE_CHARACTER_CLASS
In Java 7 there is a new property Pattern.UNICODE_CHARACTER_CLASS
that enables the Unicode version of the predefined character classes see my answer here for some more details and links
You could do something like this
Pattern p = Pattern.compile("\\w+", Pattern.UNICODE_CHARACTER_CLASS);
and \w
would match all letters and all digits from any languages (and of course some word combining characters like _
).