Python TypeError on regex

kamikaze_pilot picture kamikaze_pilot · Mar 3, 2011 · Viewed 45.3k times · Source

So, I have this code:

url = 'http://google.com'
linkregex = re.compile('<a\s*href=[\'|"](.*?)[\'"].*?>')
m = urllib.request.urlopen(url)
msg = m.read()
links = linkregex.findall(msg)

But then python returns this error:

links = linkregex.findall(msg)
TypeError: can't use a string pattern on a bytes-like object

What did I do wrong?

Answer

Lennart Regebro picture Lennart Regebro · Mar 3, 2011

TypeError: can't use a string pattern on a bytes-like object

what did i do wrong??

You used a string pattern on a bytes object. Use a bytes pattern instead:

linkregex = re.compile(b'<a\s*href=[\'|"](.*?)[\'"].*?>')
                       ^
            Add the b there, it makes it into a bytes object

(ps:

 >>> from disclaimer include dont_use_regexp_on_html
 "Use BeautifulSoup or lxml instead."

)