How to download a file using python in a 'smarter' way?

kender picture kender · May 14, 2009 · Viewed 95.8k times · Source

I need to download several files via http in Python.

The most obvious way to do it is just using urllib2:

import urllib2
u = urllib2.urlopen('http://server.com/file.html')
localFile = open('file.html', 'w')
localFile.write(u.read())
localFile.close()

But I'll have to deal with the URLs that are nasty in some way, say like this: http://server.com/!Run.aspx/someoddtext/somemore?id=121&m=pdf. When downloaded via the browser, the file has a human-readable name, ie. accounts.pdf.

Is there any way to handle that in python, so I don't need to know the file names and hardcode them into my script?

Answer

Oli picture Oli · May 14, 2009

Download scripts like that tend to push a header telling the user-agent what to name the file:

Content-Disposition: attachment; filename="the filename.ext"

If you can grab that header, you can get the proper filename.

There's another thread that has a little bit of code to offer up for Content-Disposition-grabbing.

remotefile = urllib2.urlopen('http://example.com/somefile.zip')
remotefile.info()['Content-Disposition']