I am new to Python and am hoping that someone could please explain to me what the error message means.
To be specific, I have some code of Python and SPSS combined together saved in Atom, which was created by a former colleague. Now since the former colleague is not here anymore, I need to run the code now. What I did was I ran the code below from SPSS22.
begin program.
import spss,spssaux,imp
abcvalid = imp.load_source('abcvalid', "I:/VALIDITY CHECK/Python Library/2016/abcvalid2016.py")
import abcvalid
abcvalid.fullprocess("9_26_2016","M:/Users/Yli\2016 SURVEY/DOWNLOADS/9_26_2016/","M:/Users/Yli/2016 SURVEY/Legacy15.sav")
end program.
Then I got the following from the output.
Traceback (most recent call last):
File "<string>", line 5, in <module>
File "I:/VALIDITY CHECK/Python Library/2016/abcnvalid2016.py", line 2067, in fullprocess
dataprep(date,filepath,legacypath)
File "I:/VALIDITY CHECK/Python Library/2016/abcvalid2016.py", line 2006, in dataprep
emailslower(date,filepath)
File "I:/VALIDITY CHECK/Python Library/2016/abcvalid2016.py", line 1635, in emailslower
DATASET ACTIVATE comment_data.""".format(date,filepath))
File "C:\PROGRA~1\IBM\SPSS\STATIS~1\22\Python\Lib\site-packages\spss\spss.py", line 1494, in Submit
cmdList = spssutil.CheckStr(cmdList)
File "C:\PROGRA~1\IBM\SPSS\STATIS~1\22\Python\Lib\site-packages\spss\spssutil.py", line 166, in CheckStr
s1 = unicode(mystr,locale.getlocale(locale.LC_CTYPE)[1])
File "C:\Program Files\IBM\SPSS\Statistics\22\Python\lib\encodings\cp1252.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 55: character maps to <undefined>
I know there are similar questions on this site, but the questions and answers were too hard for me to comprehend. If someone could please help me, I'd really appreciate it!
Thank you in advance!
First, here is a minimal example reproducing your error on Windows:
import subprocess
with subprocess.Popen("cmd /c echo ü", stdout=subprocess.PIPE, text=True) as Process:
for Line in Process.stdout:
print(Line)
To my understanding, the problem is this (I put together some information and examples which I have found, but am not certain everything is correct. I welcome corrections.)
ü
character is code point 252 = 0xfc in Unicode, https://unicode-table.com/en/00FC/).ü
character to the console, as you can test using this example (be sure to save the file as UTF-8):import subprocess
print(ord('ü'))
subprocess.call("cmd /c echo ü")
I am not sure why this is working in the first place. (This answer may be why: https://stackoverflow.com/a/32176732/880783)
ü
character is at position 129 = 0x81 (sounds familiar?).The key is to make Python understand that how what it gets from the process is encoded. In my example (Windows console), I have tried a couple of encodings (see the list here) like this:
import subprocess
Encoding = 'cp850'
with subprocess.Popen("cmd /c echo ü", stdout=subprocess.PIPE, text=True, encoding=Encoding) as Process:
for Line in Process.stdout:
print(Line)
'ascii'
fails with an ordinal not in range(128)
error (probably does not cover extended ASCII).'cp1252'
fails with character maps to <undefined>
'latin_1'
works, but outputs a box character (``) on my debug console in VS Code.'cp850'
seem to works, outputting a ü
character.So I will stick with 'cp850'
for now and see how it goes.