I have never dealt with encoding and decoding strings, so I am quite the newbie on this front. I am receiving a UnicodeEncodeError when I try to write the contents I read from another file to a temporary file using file.write in Python. I get the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 41333: ordinal not in range(128)
Here is what I am doing in my code. I am reading an XML file and getting the text from the "mydata" tag. I then iterate through mydata to look for CDATA
parser = etree.XMLParser(strip_cdata=False)
root = etree.parse(myfile.xml, parser)
data = root.findall('./mydata')
# iterate through list to find text (lua code) contained in elements containing CDATA
for item in myData:
myCode = item.text
# Write myCode to a temporary file.
tempDirectory = tempfile.mkdtemp(suffix="", prefix="TEST_THIS_")
file = open(tempDirectory + os.path.sep + "myCode.lua", "w")
file.write(myCode + "\n")
file.close()
It fails with the UnicodeEncodeError when I hit the following line:
file.write(myCode + "\n")
How should I properly encode and decode this?
Python2.7's open
function does not transparently handle unicode characters like python3 does. There is extensive documentation on this, but if you want to write unicode strings directly without decoding them, you can try this
>>> import codecs
>>> f = codecs.open(filename, 'w', encoding='utf8')
>>> f.write(u'\u201c')
For comparison, this is how the error happen
>>> f = open(filename, 'w')
>>> f.write(u'\u201c')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 0: ordinal not in range(128)