UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128)

Question 1

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128)

python xml unicode encoding python-unicode

speedyrazor · Nov 7, 2013 · Viewed 143.2k times · Source

Answer

Answer

You need to encode Unicode explicitly before writing to a file, otherwise Python does it for you with the default ASCII codec.

Pick an encoding and stick with it:

f.write(printinfo.encode('utf8') + '\n')

or use io.open() to create a file object that'll encode for you as you write to the file:

import io

f = io.open(filename, 'w', encoding='utf8')

You may want to read:

The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

before continuing.

Question 2

I have this code:

    printinfo = title + "\t" + old_vendor_id + "\t" + apple_id + '\n'
    # Write file
    f.write (printinfo + '\n')

But I get this error when running it:

    f.write(printinfo + '\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128)

It's having toruble writing out this:

Identité secrète (Abduction) [VF]

Any ideas please, not sure how to fix.

Cheers.

UPDATE: This is the bulk of my code, so you can see what I am doing:

def runLookupEdit(self, event):
    newpath1 = pathindir + "/"
    errorFileOut = newpath1 + "REPORT.csv"
    f = open(errorFileOut, 'w')

global old_vendor_id

for old_vendor_id in vendorIdsIn.splitlines():
    writeErrorFile = 0
    from lxml import etree
    parser = etree.XMLParser(remove_blank_text=True) # makes pretty print work

    path1 = os.path.join(pathindir, old_vendor_id)
    path2 = path1 + ".itmsp"
    path3 = os.path.join(path2, 'metadata.xml')

    # Open and parse the xml file
    cantFindError = 0
    try:
        with open(path3): pass
    except IOError:
        cantFindError = 1
        errorMessage = old_vendor_id
        self.Error(errorMessage)
        break
    tree = etree.parse(path3, parser)
    root = tree.getroot()

    for element in tree.xpath('//video/title'):
        title = element.text
        while '\n' in title:
            title= title.replace('\n', ' ')
        while '\t' in title:
            title = title.replace('\t', ' ')
        while '  ' in title:
            title = title.replace('  ', ' ')
        title = title.strip()
        element.text = title
    print title

#########################################
######## REMOVE UNWANTED TAGS ########
#########################################

    # Remove the comment tags
    comments = tree.xpath('//comment()')
    q = 1
    for c in comments:
        p = c.getparent()
        if q == 3:
            apple_id = c.text
        p.remove(c)
        q = q+1

    apple_id = apple_id.split(':',1)[1]
    apple_id = apple_id.strip()
    printinfo = title + "\t" + old_vendor_id + "\t" + apple_id

    # Write file
    # f.write (printinfo + '\n')
    f.write(printinfo.encode('utf8') + '\n')
f.close()

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128)

Answer

Related questions