I'm using python 3.5 and I'm checking the performance of urllib module Vs requests module.
I wrote two clients in python the first one is using the urllib module and the second one is using the request module.
they both generate a binary data, which I send to a server which is based on flask and from the flask server I also return a binary data to the client.
I found that time took to send the data from the client to the server took same time for both modules (urllib, requests) but the time it took to return data from the server to the client is more then twice faster in urllib compare to request.
I'm working on localhost.
my question is why?
what I'm doing wrong with request module which make it to be slower?
this is the server code :
from flask import Flask, request
app = Flask(__name__)
from timeit import default_timer as timer
import os
@app.route('/onStringSend', methods=['GET', 'POST'])
def onStringSend():
return data
if __name__ == '__main__':
data_size = int(1e7)
data = os.urandom(data_size)
app.run(host="0.0.0.0", port=8080)
this is the client code based on urllib :
import urllib.request as urllib2
import urllib.parse
from timeit import default_timer as timer
import os
data_size = int(1e7)
num_of_runs = 20
url = 'http://127.0.0.1:8080/onStringSend'
def send_binary_data():
data = os.urandom(data_size)
headers = {'User-Agent': 'Mozilla/5.0 (compatible; Chrome/22.0.1229.94; Windows NT)', 'Content-Length': '%d' % len(data), 'Content-Type': 'application/octet-stream'}
req = urllib2.Request(url, data, headers)
round_trip_time_msec = [0] * num_of_runs
for i in range(0,num_of_runs):
t1 = timer()
resp = urllib.request.urlopen(req)
response_data = resp.read()
t2 = timer()
round_trip_time_msec[i] = (t2 - t1) * 1000
t_max = max(round_trip_time_msec)
t_min = min(round_trip_time_msec)
t_average = sum(round_trip_time_msec)/len(round_trip_time_msec)
print('max round trip time [msec]: ', t_max)
print('min round trip time [msec]: ', t_min)
print('average round trip time [msec]: ', t_average)
send_binary_data()
this is the client code based on requests :
import requests
import os
from timeit import default_timer as timer
url = 'http://127.0.0.1:8080/onStringSend'
data_size = int(1e7)
num_of_runs = 20
def send_binary_data():
data = os.urandom(data_size)
s = requests.Session()
s.headers['User-Agent'] = 'Mozilla/5.0 (compatible; Chrome/22.0.1229.94;Windows NT)'
s.headers['Content-Type'] = 'application/octet-stream'
s.headers['Content-Length'] = '%d' % len(data)
round_trip_time_msec = [0] * num_of_runs
for i in range(0,num_of_runs):
t1 = timer()
response_data = s.post(url=url, data=data, stream=False, verify=False)
t2 = timer()
round_trip_time_msec[i] = (t2 - t1) * 1000
t_max = max(round_trip_time_msec)
t_min = min(round_trip_time_msec)
t_average = sum(round_trip_time_msec)/len(round_trip_time_msec)
print('max round trip time [msec]: ', t_max)
print('min round trip time [msec]: ', t_min)
print('average round trip time [msec]: ', t_average)
send_binary_data()
thanks very much
First of all, to reproduce the problem, I had to add the following line to your onStringSend
function:
request.get_data()
Otherwise, I was getting “connection reset by peer” errors because the server’s receive buffer kept filling up.
Now, the immediate reason for this problem is that Response.content
(which is called implicitly when stream=False
) iterates over the response data in chunks of 10240 bytes:
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
Therefore, the easiest way to solve the problem is to use stream=True
, thus telling Requests that you will be reading the data at your own pace:
response_data = s.post(url=url, data=data, stream=True, verify=False).raw.read()
With this change, the performance of the Requests version becomes more or less the same as that of the urllib version.
Please also see the “Raw Response Content” section in the Requests docs for useful advice.
Now, the interesting question remains: why is Response.content
iterating in such small chunks? After talking to Cory Benfield, a core developer of Requests, it looks like there may be no particular reason. I filed issue #3186 in Requests to look further into this.