I need to replace in a string the character "»" with a whitespace, but I still get an error. This is the code I use:
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
# other code
soup = BeautifulSoup(data, 'lxml')
mystring = soup.find('a').text.replace(' »','')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in position 13: ordinal not in range(128)
But If I test it with this other script:
# -*- coding: utf-8 -*-
a = "hi »"
b = a.replace('»','')
It works. Why this?
In order to replace the content of string using str.replace()
method; you need to firstly decode the string, then replace the text and encode it back to the original text:
>>> a = "hi »"
>>> a.decode('utf-8').replace("»".decode('utf-8'), "").encode('utf-8')
'hi '
You may also use the following regex to remove all the non-ascii characters from the string:
>>> import re
>>> re.sub(r'[^\x00-\x7f]',r'', 'hi »')
'hi '