UnicodeEncodeError: 'charmap' codec can't encode characters

SstrykerR picture SstrykerR · Nov 23, 2014 · Viewed 458.9k times · Source

I'm trying to scrape a website, but it gives me an error.

I'm using the following code:

import urllib.request
from bs4 import BeautifulSoup

get = urllib.request.urlopen("https://www.website.com/")
html = get.read()

soup = BeautifulSoup(html)

print(soup)

And I'm getting the following error:

File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 70924-70950: character maps to <undefined>

What can I do to fix this?

Answer

twasbrillig picture twasbrillig · Feb 27, 2017

I was getting the same UnicodeEncodeError when saving scraped web content to a file. To fix it I replaced this code:

with open(fname, "w") as f:
    f.write(html)

with this:

import io
with io.open(fname, "w", encoding="utf-8") as f:
    f.write(html)

Using io gives you backward compatibility with Python 2.

If you only need to support Python 3 you can use the builtin open function instead:

with open(fname, "w", encoding="utf-8") as f:
    f.write(html)