Download file using partial download (HTTP)

Konstantin picture Konstantin · Nov 25, 2009 · Viewed 15.8k times · Source

Is there a way to download huge and still growing file over HTTP using the partial-download feature?

It seems that this code downloads file from scratch every time it executed:

import urllib
urllib.urlretrieve ("http://www.example.com/huge-growing-file", "huge-growing-file")

I'd like:

  1. To fetch just the newly-written data
  2. Download from scratch only if the source file becomes smaller (for example it has been rotated).

Answer

Nadia Alramli picture Nadia Alramli · Nov 25, 2009

It is possible to do partial download using the range header, the following will request a selected range of bytes:

req = urllib2.Request('http://www.python.org/')
req.headers['Range'] = 'bytes=%s-%s' % (start, end)
f = urllib2.urlopen(req)

For example:

>>> req = urllib2.Request('http://www.python.org/')
>>> req.headers['Range'] = 'bytes=%s-%s' % (100, 150)
>>> f = urllib2.urlopen(req)
>>> f.read()
'l1-transitional.dtd">\n\n\n<html xmlns="http://www.w3.'

Using this header you can resume partial downloads. In your case all you have to do is to keep track of already downloaded size and request a new range.

Keep in mind that the server need to accept this header for this to work.