Compare JSON and BSON

Ronald picture Ronald · Sep 26, 2012 · Viewed 28.3k times · Source

I am comparing JSON and BSON for serializing objects. These objects contain several arrays of a large number of integers. In my test the object I am serializing contains a total number of about 12,000 integers. I am only interested in how the sizes compare of the serialized results. I am using JSON.NET as the library which does the serialization. I am using JSON because I also want to be able to work with it in Javascript.

The size of the JSON string is about 43kb and the size of the BSON result is 161kb. So a difference factor of about 4. This is not what I expected because I looked at BSON because I thought BSON is more efficient in storing data.

So my question is why is BSON not efficient, can it be made more efficient? Or is there another way of serializing data with arrays containing large number of integers, which can be easily handled in Javascript?

Below you find the code to test the JSON/BSON serialization.

        // Read file which contain json string
        string _jsonString = ReadFile();
        object _object = Newtonsoft.Json.JsonConvert.DeserializeObject(_jsonString);
        FileStream _fs = File.OpenWrite("BsonFileName");
        using (Newtonsoft.Json.Bson.BsonWriter _bsonWriter = new BsonWriter(_fs) 
               { CloseOutput = false })
        {
            Newtonsoft.Json.JsonSerializer _jsonSerializer = new JsonSerializer();
            _jsonSerializer.Serialize(_bsonWriter, _object);
            _bsonWriter.Flush();
        }

Edit:

Here are the resulting files https://skydrive.live.com/redir?resid=9A6F31F60861DD2C!362&authkey=!AKU-ZZp8C_0gcR0

Answer

saml picture saml · Sep 27, 2012

The efficiency of JSON vs BSON depends on the size of the integers you're storing. There's an interesting point where ASCII takes fewer bytes than actually storing integer types. 64-bit integers, which is how it appears your BSON document, take up 8 bytes. Your numbers are all less than 10,000, which means you could store each one in ASCII in 4 bytes (one byte for each character up through 9999). In fact, most of your data look like it's less than 1000, meaning it can be stored in 3 or fewer bytes. Of course, that deserialization takes time and isn't cheap, but it saves space. Furthermore, Javascript uses 64-bit values to represent all numbers, so if you wrote it to BSON after converting each integer to a more appropriate dataformat, your BSON file could be much larger.

According to the spec, BSON contains a lot of metadata that JSON doesn't. This metadata is mostly length prefixes so that you can skip through data you aren't interested in. For example, take the following data:

["hello there, this is an necessarily long string.  It's especially long, but you don't care about it. You're just trying to get to the next element. But I keep going on and on.",
 "oh man. here's another string you still don't care about.  You really just want the third element in the array.  How long are the first two elements? JSON won't tell you",
 "data_you_care_about"]

Now, if you're using JSON, you have to parse the entirety of the first two strings to find out where the third one is. If you use BSON, you'll get markup more like (but not actually, because I'm making this markup up for the sake of example):

[175 "hello there, this is an necessarily long string.  It's especially long, but you don't care about it. You're just trying to get to the next element. But I keep going on and on.",
 169 "oh man. here's another string you still don't care about.  You really just want the third element in the array.  How long are the first two elements? JSON won't tell you",
 19 "data_you_care_about"]

So now, you can read '175', know to skip forward 175 bytes, then read '169', skip forward 169 bytes, and then read '19' and copy the next 19 bytes to your string. That way you don't even have to parse the strings for delimiters.

Using one versus the other is very dependent on what your needs are. If you're going to be storing enormous documents that you've got all the time in the world to parse, but your disk space is limited, use JSON because it's more compact and space efficient. If you're going to be storing documents, but reducing wait time (perhaps in a server context) is more important to you than saving some disk space, use BSON.

Another thing to consider in your choice is human readability. If you need to debug a crash report that contains BSON, you'll probably need a utility to decipher it. You probably don't just know BSON, but you can just read JSON.

FAQ