I want to disallow certain UTF-8 input (server-side), e.g. eastern languages, where example input might be " 伊 ".
However, I do want to continue supporting other latin or "latin-like" characters, such as the welsh ŵ and ŷ, so checking against latin-1 is not possible.
What are my options? (if language specific, PHP preferred)
Thanks very much.
Reasoning: browser support for a lot of non-western characters is often missing (e.g. on a different browser I just see a box in the question above), so for things like display names sometimes it's appropriate to restrict it even if it's not appropriate for message bodies
Just do
preg_match('/[^\\p{Common}\\p{Latin}]/u', $string)
where $string
is an UTF-8 string. This will return "1" if there are non-latin characters and will return "0" otherwise.
Example:
var_dump(preg_match('/[^\\p{Common}\\p{Latin}]/u', 'sf..ŷaás??')); //int(0)
var_dump(preg_match('/[^\\p{Common}\\p{Latin}]/u', 'sf..ŷݤaás??')); //int(1)