Google protobuf and large binary blobs

jan picture jan · Mar 10, 2014 · Viewed 13.2k times · Source

I'm building a software to remotely control radio hardware which is attached to another PC.

I plan to use ZeroMQ for the transport and an RPC-like request-reply with different messages on top of it which represent the operations.

While most of my messages will be just some control and status information, there should be an option to set a blob of data to transmit or to request a blob of data to receive. These data blobs will usually be in the range of 5-10MB but it should be possible to also use larger blobs up to several 100MB.

For the message format, I found the google protocol buffers very appealing because I could define one message type on the transport link which has optional elements for all the commands and responses. However, the protobuf FAQ states that such large messages will negatively impact performance.

So the question is, how bad would it actually be? What negative effects are there to expect? I don't really want to base the whole communications on protobuf only to find out that it doesn't work.

Answer

Kenton Varda picture Kenton Varda · Mar 12, 2014

Frankly, it's not so much performance per se as that the library is not designed in a way you might want it to be for dealing with large messages. For example, you have to parse a message all at once, and serialize it all at once. So if you have a message containing a 100MB blob, you can't read any part of the message unless you read in the entire 100MB and block the calling thread while it parses. Also problematic is the fact that the 100MB blob will be allocated as one gigantic flat byte array. On 64-bit systems this may be fine but on 32-bit you may have address space fragmentation issues. Finally, there is a hard message size limit at 2GB.

If you are OK with these sorts of issues, then you can pretty much just do it. You will have to manually override the message size limit which for security purposes defaults to 64MB. To do this, you need to construct a CodedInputStream manually and call SetTotalBytesLimit() on it before parsing the message from it.

But personally I'd recommend trying to design your system such that big blobs can be split up into small chunks.