receiving variable size of data over TCP sockets

brainydexter picture brainydexter · Sep 12, 2014 · Viewed 7.5k times · Source

I ran into a little issue with transferring data over (TCP) sockets. Small background on what I am doing:

I am sending data from side A to B. Data sent can be of variable lengths, assuming max size to be of 1096 bytes.

A) send(clientFd, buffer, size, NULL)

on B, since I dont know what size to expect, I always try to receive 1096 bytes:

B) int receivedBytes = receive(fd, msgBuff, 1096, NULL)

However, when I did this: I realized A was sending small chunks of data..say around 80-90 bytes. After a few bursts of sending, B was clubbing them together to have receivedBytes to be 1096. This obviously corrupted data and hell broke loose.

To fix this, I broke my data in two parts: header and data.

struct IpcMsg
{
   long msgType;
   int devId;
   uint32_t senderId;
   uint16_t size; 
   uint8_t value[IPC_VALUES_SIZE]; 
};

On A side:

A) send(clientFd, buffer, size, NULL)

on B, I first receive the header and determine the size of payload to receive: and then receive the rest of the payload.

B) int receivedBytes = receive(fd, msgBuff, sizeof(IpcMsg) - sizeof( ((IpcMsg*)0)->value ), 0);
int sizeToPoll = ((IpcMsg*)buffer)->size;
printf("Size to poll: %d\n", sizeToPoll);

if (sizeToPoll != 0)
{
        bytesRead = recv(clientFd, buffer + receivedBytes, sizeToPoll, 0); 
}

So, for every send which has a payload, I end up calling receive twice. This worked for me, but I was wondering if there is a better way of doing this ?

Answer

Andy Brown picture Andy Brown · Sep 12, 2014

You're on the right lines with the idea of sending a header that contains basic information about the following data, followed by the data itself. However, this won't always work:

int receivedBytes = receive(fd, msgBuff, sizeof(IpcMsg) - sizeof( ((IpcMsg*)0)->value ), 0);
int sizeToPoll = ((IpcMsg*)buffer)->size;

The reason is that TCP is free to fragment and send your header in as many chunks as it sees fit based on its own assessment of the underlying network conditions applied to what's called the congestion control strategy. On a LAN you'll pretty much always get your header in one packet but try it across the world through the internet and you may get a much smaller number of bytes at a time.

The answer is to not call TCP's 'receive' (usually recv) directly but abstract it away into a small utility function that takes the size you really must receive and a buffer to put it into. Go into a loop receiving and appending packets until all data has arrived or an error occurs.

If you need to go asynchronous and serve multiple clients simultaneously then the same principal applies but you need to go investigate the 'select' call that allows you to be notified when data arrives.