How does the python socket.recv() method know that the end of the message has been reached?

Tommy picture Tommy · Dec 29, 2016 · Viewed 26.4k times · Source

Let's say I'm using 1024 as buffer size for my client socket:

recv(1024)

Let's assume the message the server wants to send to me consists of 2024 bytes. Only 1024 bytes can be received by my socket. What's happening to the other 1000 bytes?

  1. Will the recv-method wait for a certain amount of time (say 2 seconds) for more data to come and stop working after this time span? (I.e., if the rest of the data arrives after 3 seconds, the data will not be received by the socket any more?)

or

  1. Will the recv-method stop working immediately after having received 1024 bytes of data? (I.e. will the other 1000 bytes be discarded?)

In case that 1.) is correct ... is there a way for me to to determine the amount of time, the recv data should wait before returning or is it determined by the system? (I.e. could I tell the socket to wait for 5 seconds before stopping to wait for more data?)

UPDATE: Assume, I have the following code:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect((sys.argv[1], port))
    s.send('Hello, world')
    data = s.recv(1024)
    print("received: {}".format(data))
    s.close()

Assume that the server sends data of size > 1024 bytes. Can I be sure that the variable "data" will contain all the data (including those beyond the 1024th byte)? If I can't be sure about that, how would I have to change the code so that I can always be sure that the variable "data" will contain all the data sent (in one or many steps) from the server?

Answer

tdelaney picture tdelaney · Dec 29, 2016

It depends on the protocol. Some protocols like UDP send messages and exactly 1 message is returned per recv. Assuming you are talking about TCP specifically, there are several factors involved. TCP is stream oriented and because of things like the amount of currently outstanding send/recv data, lost/reordered packets on the wire, delayed acknowledgement of data, and the Nagle algorithm (which delays some small sends by a few hundred milliseconds), its behavior can change subtly as a conversation between client and server progresses.

All the receiver knows is that it is getting a stream of bytes. It could get anything from 1 to the fully requested buffer size on any recv. There is no one-to-one correlation between the send call on one side and the recv call on the other.

If you need to figure out message boundaries its up to the higher level protocols to figure that out. Take HTTP for example. It starts with a \r\n delimited header and then has a count of the remaining bytes the client should expect to receive. The client knows how to read the header because of the \r\n then knows exactly how many bytes are coming next. Part of the charm of RESTful protocols is that they are HTTP based and somebody else already figured this stuff out!

Some protocols use NUL to delimit messages. Others may have a fixed length binary header that includes a count of any variable data to come. I like zeromq which has a robust messaging system on top of TCP.

More details on what happens with receive...

When you do recv(1024), there are 6 possibilities

  1. There is no receive data. recv will wait until there is receive data. You can change that by setting a timeout.

  2. There is partial receive data. You'll get that part right away. The rest is either buffered or hasn't been sent yet and you just do another recv to get more (and the same rules apply).

  3. There is more than 1024 bytes available. You'll get 1024 of that data and the rest is buffered in the kernel waiting for another receive.

  4. The other side has shut down the socket. You'll get 0 bytes of data. 0 means you will never get more data on that socket. But if you keep asking for data, you'll keep getting 0 bytes.

  5. The other side has reset the socket. You'll get an exception.

  6. Some other strange thing has gone on and you'll get an exception for that.