Python: urllib/urllib2/httplib confusion

Ace picture Ace · Nov 19, 2008 · Viewed 30.8k times · Source

I'm trying to test the functionality of a web app by scripting a login sequence in Python, but I'm having some troubles.

Here's what I need to do:

  1. Do a POST with a few parameters and headers.
  2. Follow a redirect
  3. Retrieve the HTML body.

Now, I'm relatively new to python, but the two things I've tested so far haven't worked. First I used httplib, with putrequest() (passing the parameters within the URL), and putheader(). This didn't seem to follow the redirects.

Then I tried urllib and urllib2, passing both headers and parameters as dicts. This seems to return the login page, instead of the page I'm trying to login to, I guess it's because of lack of cookies or something.

Am I missing something simple?

Thanks.

Answer

S.Lott picture S.Lott · Nov 19, 2008

Focus on urllib2 for this, it works quite well. Don't mess with httplib, it's not the top-level API.

What you're noting is that urllib2 doesn't follow the redirect.

You need to fold in an instance of HTTPRedirectHandler that will catch and follow the redirects.

Further, you may want to subclass the default HTTPRedirectHandler to capture information that you'll then check as part of your unit testing.

cookie_handler= urllib2.HTTPCookieProcessor( self.cookies )
redirect_handler= HTTPRedirectHandler()
opener = urllib2.build_opener(redirect_handler,cookie_handler)

You can then use this opener object to POST and GET, handling redirects and cookies properly.

You may want to add your own subclass of HTTPHandler to capture and log various error codes, also.