Beautiful Soup cannot find a CSS class if the object has other classes, too

endolith picture endolith · Aug 7, 2009 · Viewed 15.1k times · Source

if a page has <div class="class1"> and <p class="class1">, then soup.findAll(True, 'class1') will find them both.

If it has <p class="class1 class2">, though, it will not be found. How do I find all objects with a certain class, regardless of whether they have other classes, too?

Answer

endolith picture endolith · Aug 7, 2009

Unfortunately, BeautifulSoup treats this as a class with a space in it 'class1 class2' rather than two classes ['class1','class2']. A workaround is to use a regular expression to search for the class instead of a string.

This works:

soup.findAll(True, {'class': re.compile(r'\bclass1\b')})