How to get the content from web browser using python?

raghava.nitk picture raghava.nitk · Jun 19, 2014 · Viewed 7.9k times · Source

I have a webpage : http://kff.org/womens-health-policy/state-indicator/ultrasound-requirements/# and I need to extract the table from this webpage.

Problem Encountered : I have been using BeautifulSoup and requests to get the url content. The problem with these methods is that I am able to get the web content even before the table is being generated.

So I get empty table < table> < thead> < /thead> < tbody> < /tbody> < /table>

My approach : Now I am trying to open the url in the browser using webbrowser.open_new_tab(url) and then get the content from the browser directly . This will give the server to update the table and then i will be able to get the content from the page.

Problem : I am not sure how to fetch information from Web browser directly .

Right now i am using Mozilla on windows system.

Closest link found website Link . But it gives which sites are opened and not the content

Is there any other way to let the table load in urllib2 or beautifulsoup and requests ? or is there any way to get the loaded content directly from the webpage.

Thanks

Answer

Santiclause picture Santiclause · Jun 19, 2014

The reason the table isn't being filled is because Python doesn't process the page it receives with urllib2 - so there's no DOM, no Javascript that runs, et cetera.

After reading through the source, it looks like the information you're looking for can be found at http://kff.org/datacenter.json?post_id=32781 in JSON format.