BinaryReader ReadString specifying length?

Corey Ogburn picture Corey Ogburn · Oct 31, 2013 · Viewed 10.5k times · Source

I'm working on a parser to receive UDP information, parse it, and store it. To do so I'm using a BinaryReader since it will mostly be binary information. Some of it will be strings though. MSDN says for the ReadString() function:

Reads a string from the current stream. The string is prefixed with the length, encoded as an integer seven bits at a time.

And I completely understand it up until "seven bits at a time" which I tried to simply ignore until I started testing. I'm creating my own byte array before putting it into a MemoryStream and attempting to read it with a BinaryReader. Here's what I first thought would work:

byte[] data = new byte[] { 3, 0, 0, 0, (byte)'C', (byte)'a', (byte)'t', }
BinaryReader reader = new BinaryReader(new MemoryStream(data));
String str = reader.ReadString();

Knowing an int is 4 bytes (and toying around long enough to find out that BinaryReader is Little Endian) I pass it the length of 3 and the corresponding letters. However str ends up holding \0\0\0. If I remove the 3 zeros and just have

byte[] data = new byte[] { 3, (byte)'C', (byte)'a', (byte)'t', }

Then it reads and stores Cat properly. To me this conflicts with the documentation saying that the length is supposed to be an integer. Now I'm beginning to think they simply mean a number with no decimal place and not the data type int. Does this mean that a BinaryReader can never read a string larger than 127 characters (since that would be 01111111 corresponding to the 7 bits part of the documentation)?

I'm writing up a protocol and need to completely understand what I'm getting into before I pass our documentation along to our clients.

Answer

Corey Ogburn picture Corey Ogburn · Oct 31, 2013

I found the source code for BinaryReader. It uses a function called Read7BitEncodedInt() and after looking up that documentation and the documentation for Write7BitEncodedInt() I found this:

The integer of the value parameter is written out seven bits at a time, starting with the seven least-significant bits. The high bit of a byte indicates whether there are more bytes to be written after this one. If value will fit in seven bits, it takes only one byte of space. If value will not fit in seven bits, the high bit is set on the first byte and written out. value is then shifted by seven bits and the next byte is written. This process is repeated until the entire integer has been written.

Also, Ralf found this link that better displays what's going on.