When downloading a large file with python, I want to put a time limit not only for the connection process, but also for the download.
I am trying with the following python code:
import requests
r = requests.get('http://ipv4.download.thinkbroadband.com/1GB.zip', timeout = 0.5, prefetch = False)
print r.headers['content-length']
print len(r.raw.read())
This does not work (the download is not time limited), as correctly noted in the docs: https://requests.readthedocs.org/en/latest/user/quickstart/#timeouts
This would be great if it was possible:
r.raw.read(timeout = 10)
The question is, how to put a time limit to the download?
And the answer is: do not use requests, as it is blocking. Use non-blocking network I/O, for example eventlet:
import eventlet
from eventlet.green import urllib2
from eventlet.timeout import Timeout
url5 = 'http://ipv4.download.thinkbroadband.com/5MB.zip'
url10 = 'http://ipv4.download.thinkbroadband.com/10MB.zip'
urls = [url5, url5, url10, url10, url10, url5, url5]
def fetch(url):
response = bytearray()
with Timeout(60, False):
response = urllib2.urlopen(url).read()
return url, len(response)
pool = eventlet.GreenPool()
for url, length in pool.imap(fetch, urls):
if (not length):
print "%s: timeout!" % (url)
else:
print "%s: %s" % (url, length)
Produces expected results:
http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880
http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880
http://ipv4.download.thinkbroadband.com/10MB.zip: timeout!
http://ipv4.download.thinkbroadband.com/10MB.zip: timeout!
http://ipv4.download.thinkbroadband.com/10MB.zip: timeout!
http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880
http://ipv4.download.thinkbroadband.com/5MB.zip: 5242880