Let's say (for simplicity's sake) that I have a multibyte, UTF-8 encoded string variable with 3 letters (consisting of 4 bytes):
$original = 'Fön';
Since it's UTF-8, the bytes' hex values are (excluding the BOM):
46 C3 B6 6E
As the $original
variable is user-defined, I will need to hande two things:
I would tend to use strlen()
to handle "1.", and access the $original
variable's bytes with a simple `$original[$byteposition]
like this:
<?php
header('Content-Type: text/html; charset=UTF-8');
$original = 'Fön';
$totalbytes = strlen($original);
for($byteposition = 0; $byteposition < $totalbytes; $byteposition++)
{
$currentbyte = $original[$byteposition];
/*
Doesn't work since var_dump shows 3 bytes.
*/
var_dump($currentbyte);
/*
Fails too since "ord" only works on ASCII chars.
It returns "46 F6 6E"
*/
printf("%02X", ord($currentbyte));
echo('<br>');
}
exit();
?>
This proves my initial idea is not working:
How can I get the single bytes from a multibyte PHP string variable in a binary-safe way?
What I am looking for is a binary-safe way to convert UTF-8 string(s) into byte-array(s).
you can get a bytearray by unpacking the utf8_encoded string $a:
$a = utf8_encode('Fön');
$b = unpack('C*', $a);
var_dump($b);
used format C* for "unsigned char"
References