iconv: Convert from CP1252 to UTF-8

Question 1

iconv: Convert from CP1252 to UTF-8

iconv

Somebody · Mar 15, 2013 · Viewed 29.9k times · Source

Answer

Answer

When you convert CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8 with command iconv.exe -f CP1252 -t UTF-8 test.txt >testout.txt then the source file test.txt (Hex view:

enter image description here

) will be converted into target file testout.txt (Hex view:

enter image description here

) which is UTF-8 code for Çàïèñêè ýêñïåäèòîðà.

Same garbage you put in will come the other end out. iconv's behavior is correct and as expected.

What you are perplexed by is that you don't see what you expect and that is because your input 8bit string is actually encoded in Windows-1251 (Cyrillic) Codepage.

→ So the correct code page is not CP1252 but CP1251 ←

enter image description here

Command iconv.exe -f CP1251 -t UTF-8 test.txt >testout2.txt converts the source file test.txt into target file testout2.txt (Hex view:

enter image description here

) which is UTF-8 code for Записки экспедитора which is what your user's expect to see

Question 2

I'm trying to convert the CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8. I have tried this command:

iconv -c -f=WINDOWS-1252 -t=UTF-8 test.txt

No luck, getting some weird results:

ÃŠÃ€Ã‡Ã€ÃÃœ ÃÃŽÃ‚Ã›Ã‰ Ã‚Ã…ÃŠ

I tried entering the same string (Çàïèñêè ýêñïåäèòîðà) here, and they are able to convert it without problems: http://www.artlebedev.ru/tools/decoder/

What is going wrong?

iconv: Convert from CP1252 to UTF-8

Answer

Related questions