Beautiful Soup Using Regex to Find Tags?

user3314418 picture user3314418 · Jul 15, 2014 · Viewed 63.2k times · Source

I'd really like to be able to allow Beautiful Soup to match any list of tags, like so. I know attr accepts regex, but is there anything in beautiful soup that allows you to do so?

soup.findAll("(a|div)")

Output:

<a> ASDFS
<div> asdfasdf
<a> asdfsdf

My goal is to create a scraper that can grab tables from sites. Sometimes tags are named inconsistently, and I'd like to be able to input a list of tags to name the 'data' part of a table.

Answer

Manu CJ picture Manu CJ · Nov 3, 2017

Note that you can also use regular expressions to search in attributes of tags. For example:

import re
from bs4 import BeautifulSoup

soup.find_all('a', {'href': re.compile(r'crummy\.com/')})

This example finds all <a> tags that link to a website containing the substring 'crummy.com'.