I have am trying to parse a webpage that looks like this with Python->Beautiful Soup:
I am trying to extract the contents of the highlighted td div. Currently I can get all the divs by
alltd = soup.findAll('td')
for td in alltd:
print td
But I am trying to narrow the scope of that to search the tds in the class "tablebox" which still will probably return 30+ but is more managable a number than 300+.
How can I extract the contents of the highlighted td in picture above?
It is useful to know that whatever elements BeautifulSoup finds within one element still have the same type as that parent element - that is, various methods can be called.
So this is somewhat working code for your example:
soup = BeautifulSoup(html)
divTag = soup.find_all("div", {"class": "tablebox"})
for tag in divTag:
tdTags = tag.find_all("td", {"class": "align-right"})
for tag in tdTags:
print tag.text
This will print all the text of all the td
tags with the class of "align-right" that have a parent div
with the class of "tablebox".