Extract content within a tag with BeautifulSoup

ready picture ready · May 14, 2011 · Viewed 66.6k times · Source

I'd like to extract the content Hello world. Please note that there are multiples <table> and similar <td colspan="2"> on the page as well:

<table border="0" cellspacing="2" width="800">
  <tr>
    <td colspan="2"><b>Name: </b>Hello world</td>
  </tr>
  <tr>
...

I tried the following:

hello = soup.find(text='Name: ')
hello.findPreviousSiblings

But it returned nothing.

In addition, I'm also having problem with the following extracting the My home address:

<td><b>Address:</b></td>

<td>My home address</td>

I'm also using the same method to search for the text="Address: " but how do I navigate down to the next line and extract the content of <td>?

Answer

solvingPuzzles picture solvingPuzzles · Jan 9, 2013

The contents operator works well for extracting text from <tag>text</tag> .


<td>My home address</td> example:

s = '<td>My home address</td>'
soup =  BeautifulSoup(s)
td = soup.find('td') #<td>My home address</td>
td.contents #My home address

<td><b>Address:</b></td> example:

s = '<td><b>Address:</b></td>'
soup =  BeautifulSoup(s)
td = soup.find('td').find('b') #<b>Address:</b>
td.contents #Address: