I am trying to get the length of this unicode characters string
$text = 'نام سلطان م';
$length = strlen($text);
echo $length;
output
20
How it determines the length of unicode characters string?
strlen()
is not handling multibyte characters correctly, as it assumes 1 char equals 1 byte, which is simply invalid for unicode. This behavior is clearly documented:
strlen() returns the number of bytes rather than the number of characters in a string.
The solution is to use mb_strlen()
function instead (mb
stands for multi byte
) (see mb_strlen() docs).
EDIT
If for any reason change in code is not possible/doable, one may want to ensure string functions are automatically overloaded by multi-byte counterparts:
To use function overloading, set mbstring.func_overload in php.ini to a positive value that represents a combination of bitmasks specifying the categories of functions to be overloaded. It should be set to 1 to overload the mail() function. 2 for string functions, 4 for regular expression functions. For example, if it is set to 7, mail, strings and regular expression functions will be overloaded.
This is supported by PHP and documented here (note this feature is deprecated in PHP 7.2 and newer).
Please note that you may also need to edit your php.ini
to ensure mb_string module is enabled. Available settings are documented here.