I'm trying to convert the CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8. I have tried this command:
iconv -c -f=WINDOWS-1252 -t=UTF-8 test.txt
No luck, getting some weird results:
ÊÀÇÀÃÃœ ÃÎÂÛÉ ÂÅÊ
I tried entering the same string (Çàïèñêè ýêñïåäèòîðà) here, and they are able to convert it without problems: http://www.artlebedev.ru/tools/decoder/
What is going wrong?
When you convert CP1252 encoded string Çàïèñêè ýêñïåäèòîðà
to UTF-8 with command iconv.exe -f CP1252 -t UTF-8 test.txt >testout.txt
then the source file test.txt
(Hex view:
) will be converted into target file testout.txt
(Hex view:
) which is UTF-8 code for Çàïèñêè ýêñïåäèòîðà
.
Same garbage you put in will come the other end out. iconv's behavior is correct and as expected.
What you are perplexed by is that you don't see what you expect and that is because your input 8bit string is actually encoded in Windows-1251 (Cyrillic) Codepage.
→ So the correct code page is not CP1252 but CP1251 ←
Command iconv.exe -f CP1251 -t UTF-8 test.txt >testout2.txt
converts the source file test.txt
into target file testout2.txt
(Hex view:
) which is UTF-8 code for Записки экспедитора
which is what your user's expect to see