UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 55: character maps to <undefined>

user6655908 picture user6655908 · Sep 28, 2016 · Viewed 11.2k times · Source

I am new to Python and am hoping that someone could please explain to me what the error message means.

To be specific, I have some code of Python and SPSS combined together saved in Atom, which was created by a former colleague. Now since the former colleague is not here anymore, I need to run the code now. What I did was I ran the code below from SPSS22.

    begin program.
    import spss,spssaux,imp
    abcvalid = imp.load_source('abcvalid', "I:/VALIDITY CHECK/Python Library/2016/abcvalid2016.py") 
    import abcvalid
    abcvalid.fullprocess("9_26_2016","M:/Users/Yli\2016 SURVEY/DOWNLOADS/9_26_2016/","M:/Users/Yli/2016 SURVEY/Legacy15.sav")
    end program.

Then I got the following from the output.

    Traceback (most recent call last):
      File "<string>", line 5, in <module>
      File "I:/VALIDITY CHECK/Python Library/2016/abcnvalid2016.py", line 2067, in fullprocess
        dataprep(date,filepath,legacypath)
      File "I:/VALIDITY CHECK/Python Library/2016/abcvalid2016.py", line 2006, in dataprep
        emailslower(date,filepath)
      File "I:/VALIDITY CHECK/Python Library/2016/abcvalid2016.py", line 1635, in emailslower
        DATASET ACTIVATE comment_data.""".format(date,filepath))
      File "C:\PROGRA~1\IBM\SPSS\STATIS~1\22\Python\Lib\site-packages\spss\spss.py", line 1494, in Submit
        cmdList = spssutil.CheckStr(cmdList)
      File "C:\PROGRA~1\IBM\SPSS\STATIS~1\22\Python\Lib\site-packages\spss\spssutil.py", line 166, in CheckStr
        s1 = unicode(mystr,locale.getlocale(locale.LC_CTYPE)[1])
      File "C:\Program Files\IBM\SPSS\Statistics\22\Python\lib\encodings\cp1252.py", line 15, in decode
        return codecs.charmap_decode(input,errors,decoding_table)
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 55: character maps to <undefined>

I know there are similar questions on this site, but the questions and answers were too hard for me to comprehend. If someone could please help me, I'd really appreciate it!

Thank you in advance!

Answer

bers picture bers · Mar 15, 2019

First, here is a minimal example reproducing your error on Windows:

import subprocess

with subprocess.Popen("cmd /c echo ü", stdout=subprocess.PIPE, text=True) as Process:
    for Line in Process.stdout:
        print(Line)

To my understanding, the problem is this (I put together some information and examples which I have found, but am not certain everything is correct. I welcome corrections.)

  • The ü character is code point 252 = 0xfc in Unicode, https://unicode-table.com/en/00FC/).
  • Python correct passes the ü character to the console, as you can test using this example (be sure to save the file as UTF-8):
import subprocess

print(ord('ü'))
subprocess.call("cmd /c echo ü")

I am not sure why this is working in the first place. (This answer may be why: https://stackoverflow.com/a/32176732/880783)

  • The console uses something else than Unicode internally. For example, in the ASCII table, the ü character is at position 129 = 0x81 (sounds familiar?).
  • So when the console returns that character, Python thinks its a Unicode codepoint, but 0x81 is not defined. Hence the error.

The key is to make Python understand that how what it gets from the process is encoded. In my example (Windows console), I have tried a couple of encodings (see the list here) like this:

import subprocess

Encoding = 'cp850'
with subprocess.Popen("cmd /c echo ü", stdout=subprocess.PIPE, text=True, encoding=Encoding) as Process:
    for Line in Process.stdout:
        print(Line)
  • 'ascii' fails with an ordinal not in range(128) error (probably does not cover extended ASCII).
  • 'cp1252' fails with character maps to <undefined>
  • 'latin_1' works, but outputs a box character (``) on my debug console in VS Code.
  • 'cp850' seem to works, outputting a ü character.

So I will stick with 'cp850' for now and see how it goes.