UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

dtrinh picture dtrinh · Nov 15, 2016 · Viewed 29.8k times · Source

I am trying to pull a list of 500 restaurants in Amsterdam from TripAdvisor; however after the 308th restaurant I get the following error:

Traceback (most recent call last):
  File "C:/Users/dtrinh/PycharmProjects/TripAdvisorData/LinkPull-HK.py", line 43, in <module>
    writer.writerow(rest_array)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

I tried several things I found on StackOverflow, but nothing is working as of right now. I was wondering if someone could take a look at my code and see any potential solutions that would be great.

        for item in soup2.findAll('div', attrs={'class', 'title'}):
            if 'Cuisine' in item.text:
                item.text.strip()
                content = item.findNext('div', attrs=('class', 'content'))
                cuisine_type = content.text.encode('utf8', 'ignore').strip().split(r'\xa0')
        rest_array = [account_name, rest_address, postcode, phonenumber, cuisine_type]
        #print rest_array
        with open('ListingsPull-Amsterdam.csv', 'a') as file:
                writer = csv.writer(file)
                writer.writerow(rest_array)
    break

Answer

Laurent LAPORTE picture Laurent LAPORTE · Nov 15, 2016

The rest_array contains unicode strings. When you use csv.writer to write rows, you need to serialise bytes strings (you are on Python 2.7).

I suggest you to use "utf8" encoding:

with open('ListingsPull-Amsterdam.csv', mode='a') as fd:
    writer = csv.writer(fd)
    rest_array = [text.encode("utf8") for text in rest_array]
    writer.writerow(rest_array)

note: please, don't use file as variable because you shadow the built-in function file() (an alias of open() function).

If you want to open this CSV file with Microsoft Excel, you may consider using another encoding, for instance "cp1252" (it allows u"\u2019" character).