Python seek on remote file using HTTP

Marconi picture Marconi · Dec 28, 2009 · Viewed 7.3k times · Source

How do I seek to a particular position on a remote (HTTP) file so I can download only that part?

Lets say the bytes on a remote file were: 1234567890

I wanna seek to 4 and download 3 bytes from there so I would have: 456

and also, how do I check if a remote file exists? I tried, os.path.isfile() but it returns False when I'm passing a remote file url.

Answer

jbochi picture jbochi · Dec 28, 2009

If you are downloading the remote file through HTTP, you need to set the Range header.

Check in this example how it can be done. Looks like this:

myUrlclass.addheader("Range","bytes=%s-" % (existSize))

EDIT: I just found a better implementation. This class is very simple to use, as it can be seen in the docstring.

class HTTPRangeHandler(urllib2.BaseHandler):
"""Handler that enables HTTP Range headers.

This was extremely simple. The Range header is a HTTP feature to
begin with so all this class does is tell urllib2 that the 
"206 Partial Content" reponse from the HTTP server is what we 
expected.

Example:
    import urllib2
    import byterange

    range_handler = range.HTTPRangeHandler()
    opener = urllib2.build_opener(range_handler)

    # install it
    urllib2.install_opener(opener)

    # create Request and set Range header
    req = urllib2.Request('http://www.python.org/')
    req.header['Range'] = 'bytes=30-50'
    f = urllib2.urlopen(req)
"""

def http_error_206(self, req, fp, code, msg, hdrs):
    # 206 Partial Content Response
    r = urllib.addinfourl(fp, hdrs, req.get_full_url())
    r.code = code
    r.msg = msg
    return r

def http_error_416(self, req, fp, code, msg, hdrs):
    # HTTP's Range Not Satisfiable error
    raise RangeError('Requested Range Not Satisfiable')

Update: The "better implementation" has moved to github: excid3/urlgrabber in the byterange.py file.