Automate interaction with a webpage in python

nafe picture nafe · Dec 3, 2009 · Viewed 9k times · Source

I want to automate interaction with a webpage. I've been using pycurl up til now but eventually the webpage will use javascript so I'm looking for alternatives . A typical interaction is "open the page, search for some text, click on a link (which opens a form), fill out the form and submit".

We're deploying on Google App engine, if that makes a difference.

Clarification: we're deploying the webpage on appengine. But the interaction is run on a separate machine. So selenium seems like it's the best choice.

Answer

Alex Martelli picture Alex Martelli · Dec 3, 2009

Twill and mechanize don't do Javascript, and Qt and Selenium can't run on App Engine ((1)), which only supports pure Python code. I do not know of any pure-Python Javascript interpreter, which is what you'd need to deploy a JS-supporting scraper on App Engine:-(.

Maybe there's something in Java, which would at least allow you to deploy on (the Java version of) App Engine? App Engine app versions in Java and Python can use the same datastore, so you could keep some part of your app in Python... just not the part that needs to understand Javascript. Unfortunately I don't know enough about the Java / AE environment to suggest any specific package to try.

((1)): to clarify, since there seems to be a misunderstanding that has gotten so far as to get me downvoted: if you run Selenium or other scrapers on a different computer, you can of course target a site deployed in App Engine (it doesn't matter how the website you're targeting is deployed, what programming language[s] it uses, etc, etc, as long as it's a website you can access [[real website: flash, &c, may likely be different]]). How I read the question is, the OP is looking for ways to have the scraping run as part of an App Engine app -- that is the problematic part, not where you (or somebody else;-) runs the site being scraped!