UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>

Eden Crow picture Eden Crow · Feb 10, 2012 · Viewed 773k times · Source

I'm trying to get a Python 3 program to do some manipulations with a text file filled with information. However, when trying to read the file I get the following error:

 Traceback (most recent call last):  
     File "SCRIPT LOCATION", line NUMBER, in <module>  
     `text = file.read()`  
     File "C:\Python31\lib\encodings\cp1252.py", line 23, in decode  
     `return codecs.charmap_decode(input,self.errors,decoding_table)[0]`  
     UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 2907500: character maps to `<undefined>`  

Answer

Lennart Regebro picture Lennart Regebro · Feb 10, 2012

The file in question is not using the CP1252 encoding. It's using another encoding. Which one you have to figure out yourself. Common ones are Latin-1 and UTF-8. Since 0x90 doesn't actually mean anything in Latin-1, UTF-8 (where 0x90 is a continuation byte) is more likely.

You specify the encoding when you open the file:

file = open(filename, encoding="utf8")