php: converting from cp1251 to utf8

Pigalev Pavel picture Pigalev Pavel · Nov 22, 2012 · Viewed 8.2k times · Source

I have a problem converting a string from cp1251 to utf8...

I need to get some names from database and those names are in cp1251(i'm not the one who made that database, so I can't edit it, but I know for sure that these names are cp1251)...

The name in database is this - "Р?нтернет РІ цифрах" I'm converting it to utf8 using iconv function like this:

iconv("UTF-8", "CP1251//IGNORE", $name)

and what I have in the result is this - "�?нтернет в цифрах"(it's Russian), but the first two symbols are not correct... it should be "Интернет в цифрах"...

So the final thing that I have to do is somehow change these two symbols "�?" to russian letter "И"... and I really don't know how to do that... I've tried to use preg_replace, but it doesn't work...or I'm not using it correctly.

And I'm sorry for Russian letters, it is really hard to explain what I need without showing them.

Answer

Joni picture Joni · Nov 25, 2012

The first letter comes out incorrect because one of the bytes needed to store the UTF-8 encoding of И (0x98 to be exact) is not used in CP1251. If the database has replaced the 98 byte by a question mark you have to change it back before using iconv:

$name = str_replace("\xD0\x3F", "\xD0\x98", $name);
echo iconv("UTF-8", "CP1251//IGNORE", $name);