PHP: mb_strtoupper not working

Alasdair picture Alasdair · Feb 24, 2013 · Viewed 11.8k times · Source

I have a problem with UTF-8 and mb_strtoupper.

mb_internal_encoding('UTF-8');
$guesstitlestring='Le Courrier de Sáint-Hyácinthe';

$encoding=mb_detect_encoding($guesstitlestring);
if ($encoding!=='UTF-8') $guesstitlestring=mb_convert_encoding($guesstitlestring,'UTF-8',$encoding);

echo "DEBUG1 $guesstitlestring\n";
$guesstitlestring=mb_strtoupper($guesstitlestring);
echo "DEBUG2 $guesstitlestring\n";

Result:

DEBUG1 Le Courrier de Sáint-Hyácinthe
DEBUG2 LE COURRIER DE S?INT-HY?CINTHE

I don't understand why this is happening? I'm trying to be as careful as I can with the encoding. The string is given first as a UTF-8, verified and possible reconverted to UTF-8. It's a nightmare!

UPDATE

So I've figured out that this was caused by a combination of my entering the arguments via the console and the arguments coming back out of the console. So they were garbled both on the way in and the way out. The solution is to not enter any of the arguments in this way, or get the arguments out in this way.

Thank you everyone for your help in resolving this issue!

Answer

powtac picture powtac · Feb 24, 2013

Instead of strtoupper()/mb_strtoupper() use mb_convert_case() since upper case converting is very tricky across different encodings, also make sure your string IS UTF-8.

$content = 'Le Courrier de Sáint-Hyácinthe';

mb_internal_encoding('UTF-8');
if(!mb_check_encoding($content, 'UTF-8')
    OR !($content === mb_convert_encoding(mb_convert_encoding($content, 'UTF-32', 'UTF-8' ), 'UTF-8', 'UTF-32'))) {

    $content = mb_convert_encoding($content, 'UTF-8'); 
}

// LE COURRIER DE SÁINT-HYÁCINTHE
echo mb_convert_case($content, MB_CASE_UPPER, "UTF-8"); 

Working example: http://3v4l.org/enEfm#v443

See also my comment at the PHP website about the converter: http://www.php.net/manual/function.utf8-encode.php#102382