Are the PHP preg_functions multibyte safe?

Spoonface picture Spoonface · Nov 19, 2009 · Viewed 17.9k times · Source

There are no multibyte 'preg' functions available in PHP, so does that mean the default preg_functions are all mb safe? Couldn't find any mention in the php documentation.

Answer

user187291 picture user187291 · Nov 19, 2009

pcre supports utf8 out of the box, see documentation for the 'u' modifier.

Illustration (\xC3\xA4 is the utf8 encoding for the german letter "ä")

  echo preg_replace('~\w~', '@', "a\xC3\xA4b");

this echoes "@@¤@" because "\xC3" and "\xA4" were treated as distinct symbols

  echo preg_replace('~\w~u', '@', "a\xC3\xA4b");

(note the 'u') prints "@@@" because "\xC3\xA4" were treated as a single letter.