How can I read exactly one response chunk with python's http.client?

Ben Burns picture Ben Burns · Jul 1, 2014 · Viewed 9.8k times · Source

Using http.client in Python 3.3+ (or any other builtin python HTTP client library), how can I read a chunked HTTP response exactly one HTTP chunk at a time?

I'm extending an existing test fixture (written in python using http.client) for a server which writes its response using HTTP's chunked transfer encoding. For the sake of simplicity, let's say that I'd like to be able to print a message whenever an HTTP chunk is received by the client.

My code follows a fairly standard pattern for reading a large response:

conn = http.client.HTTPConnection(...)
conn.request(...)
response = conn.getresponse()

resbody = []

while True:
    chunk = response.read(1024)
    if len(chunk):
        resbody.append(chunk)
    else:
        break

conn.close();

But this reads 1024 byte chunks regardless of whether or not the server is sending 10 byte chunks or 10MiB chunks.

What I'm looking for would be something like the following:

while True:
    chunk = response.readchunk()
    if len(chunk):
        resbody.append(chunk)
    else
        break

If this is not possible with http.client, is it possible with another builtin http client library? If it's not possible with a builtin client lib, is it possible with pip installable module?

Answer

poida picture poida · Oct 31, 2014

Update:

The benefit of chunked transfer encoding is to allow the transmission of dynamically generated content. Whether a HTTP library lets you read individual chunks or not is a separate issue (see RFC 2616 - Section 3.6.1).

I can see how what you are trying to do would be useful, but the standard python http client libraries don't do what you want without some hackery (see http.client and httplib).

What you are trying to do may be fine for use in your test fixture, but in the wild there are no guarantees. It is possible for the chunking of the data read by your client to be be different from the chunking of the data sent by your server. E.g. the data could have been "re-chunked" by a proxy server before it arrived (see RFC 2616 - Section 3.2 - Framing Techniques).


The trick is to tell the response object that it isn't chunked (resp.chunked = False) so that it returns the raw bytes. This allows you to parse the size and data of each chunk as it is returned.

import http.client

conn = http.client.HTTPConnection("localhost")
conn.request('GET', "/")
resp = conn.getresponse()
resp.chunked = False

def get_chunk_size():
    size_str = resp.read(2)
    while size_str[-2:] != b"\r\n":
        size_str += resp.read(1)
    return int(size_str[:-2], 16)

def get_chunk_data(chunk_size):
    data = resp.read(chunk_size)
    resp.read(2)
    return data

respbody = ""
while True:
    chunk_size = get_chunk_size()
    if (chunk_size == 0):
        break
    else:
        chunk_data = get_chunk_data(chunk_size)
        print("Chunk Received: " + chunk_data.decode())
        respbody += chunk_data.decode()

conn.close()
print(respbody)