Remove a tag using BeautifulSoup but keep its contents

Jason Christa picture Jason Christa · Nov 19, 2009 · Viewed 56.5k times · Source

Currently I have code that does something like this:

soup = BeautifulSoup(value)

for tag in soup.findAll(True):
    if tag.name not in VALID_TAGS:
        tag.extract()
soup.renderContents()

Except I don't want to throw away the contents inside the invalid tag. How do I get rid of the tag but keep the contents inside when calling soup.renderContents()?

Answer

slacy picture slacy · Dec 9, 2011

Current versions of the BeautifulSoup library have an undocumented method on Tag objects called replaceWithChildren(). So, you could do something like this:

html = "<p>Good, <b>bad</b>, and <i>ug<b>l</b><u>y</u></i></p>"
invalid_tags = ['b', 'i', 'u']
soup = BeautifulSoup(html)
for tag in invalid_tags: 
    for match in soup.findAll(tag):
        match.replaceWithChildren()
print soup

Looks like it behaves like you want it to and is fairly straightforward code (although it does make a few passes through the DOM, but this could easily be optimized.)