How to get the content from web browser using python?

python web-scraping python-webbrowser

raghava.nitk · Jun 19, 2014 · Viewed 7.9k times · Source

I have a webpage : http://kff.org/womens-health-policy/state-indicator/ultrasound-requirements/# and I need to extract the table from this webpage.

Problem Encountered : I have been using BeautifulSoup and requests to get the url content. The problem with these methods is that I am able to get the web content even before the table is being generated.

So I get empty table < table> < thead> < /thead> < tbody> < /tbody> < /table>

My approach : Now I am trying to open the url in the browser using webbrowser.open_new_tab(url) and then get the content from the browser directly . This will give the server to update the table and then i will be able to get the content from the page.

Problem : I am not sure how to fetch information from Web browser directly .

Right now i am using Mozilla on windows system.

Closest link found website Link . But it gives which sites are opened and not the content

Is there any other way to let the table load in urllib2 or beautifulsoup and requests ? or is there any way to get the loaded content directly from the webpage.

Thanks

Answer

The reason the table isn't being filled is because Python doesn't process the page it receives with urllib2 - so there's no DOM, no Javascript that runs, et cetera.

After reading through the source, it looks like the information you're looking for can be found at http://kff.org/datacenter.json?post_id=32781 in JSON format.

How to get the content from web browser using python?

Answer

Related questions