Although there are similar questions, I can't seem to find a working solution for my case:
I'm encountering some annoying hex chars in strings, e.g.
'\xe2\x80\x9chttp://www.google.com\xe2\x80\x9d blah blah#%#@$^blah'
What I need is to remove these hex \xHH
characters, and them alone, in order to get the following result:
'http://www.google.com blah blah#%#@$^blah'
decoding doesn't help:
s.decode('utf8') # u'\u201chttp://www.google.com\u201d blah blah#%#@$^blah'
How can I achieve that?
Just remove all non-ASCII characters:
>>> s.decode('utf8').encode('ascii', errors='ignore')
'http://www.google.com blah blah#%#@$^blah'
Other possible solution:
>>> import string
>>> s = '\xe2\x80\x9chttp://www.google.com\xe2\x80\x9d blah blah#%#@$^blah'
>>> printable = set(string.printable)
>>> filter(lambda x: x in printable, s)
'http://www.google.com blah blah#%#@$^blah'
Or use Regular expressions:
>>> import re
>>> re.sub(r'[^\x00-\x7f]',r'', s)
'http://www.google.com blah blah#%#@$^blah'
Pick your favorite one.