I'd really like to be able to allow Beautiful Soup to match any list of tags, like so. I know attr accepts regex, but is there anything in beautiful soup that allows you to do so?
soup.findAll("(a|div)")
Output:
<a> ASDFS
<div> asdfasdf
<a> asdfsdf
My goal is to create a scraper that can grab tables from sites. Sometimes tags are named inconsistently, and I'd like to be able to input a list of tags to name the 'data' part of a table.
Note that you can also use regular expressions to search in attributes of tags. For example:
import re
from bs4 import BeautifulSoup
soup.find_all('a', {'href': re.compile(r'crummy\.com/')})
This example finds all <a>
tags that link to a website containing the substring 'crummy.com'
.