How to serialize object + compress it and then decompress + deserialize without third-party library?

Marek picture Marek · Aug 23, 2012 · Viewed 20.7k times · Source

I have a big object in memory which I want to save as a blob into database. I want to compress it before saving because database server is usually not local.

This is what I have at the moment:

using (var memoryStream = new MemoryStream())
{
  using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress))
  {
    BinaryFormatter binaryFormatter = new BinaryFormatter();
    binaryFormatter.Serialize(gZipStream, obj);

    return memoryStream.ToArray();
  }
}

However when I zip same bytes with Total Commander it cuts down the size always by 50% at least. With the above code it compresses 58MB to 48MB and anything smaller than 15MB gets even bigger.

Should I use a third-party zip library or is there a better way of doing this in .NET 3.5. Any other alternatives to my problem?

EDIT:

Just found a bug in a code above. Angelo thanks for your fix.

GZipStream compression is still not great. I gets Average 35% compression by gZipStream compared to TC 48% compression.

I have no idea what kind of bytes I was getting out with previous version :)

EDIT2:

I have found how to improve compression from 20% to 47%. I had to use two Memory streams instead of one! Can anyone explain why is this the case?

Here is a code with 2 memory streams which does a lot better compression !!!

using (MemoryStream msCompressed = new MemoryStream())
using (GZipStream gZipStream = new GZipStream(msCompressed, CompressionMode.Compress))
using (MemoryStream msDecompressed = new MemoryStream())
{
  new BinaryFormatter().Serialize(msDecompressed, obj);
  byte[] byteArray = msDecompressed.ToArray();

  gZipStream.Write(byteArray, 0, byteArray.Length);
  gZipStream.Close();
  return msCompressed.ToArray();
}

Answer

João Angelo picture João Angelo · Aug 23, 2012

You have a bug in your code and the explanation is too long for a comment so I present it as an answer even though it's not answering your real question.

You need to call memoryStream.ToArray() only after closing GZipStream otherwise you are creating compressed data that you will not be able to deserialize.

Fixed code follows:

using (var memoryStream = new System.IO.MemoryStream())
{
  using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress))
  {
    BinaryFormatter binaryFormatter = new BinaryFormatter();
    binaryFormatter.Serialize(gZipStream, obj);
  }
  return memoryStream.ToArray();
}

The GZipStream writes to the underlying buffer in chunks and also appends a footer to the end of the stream and this is only performed at the moment you close the stream.

You can easily prove this by running the following code sample:

byte[] compressed;
int[] integers = new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

var mem1 = new MemoryStream();
using (var compressor = new GZipStream(mem1, CompressionMode.Compress))
{
    new BinaryFormatter().Serialize(compressor, integers);
    compressed = mem1.ToArray();
}

var mem2 = new MemoryStream(compressed);
using (var decompressor = new GZipStream(mem2, CompressionMode.Decompress))
{
    // The next line will throw SerializationException
    integers = (int[])new BinaryFormatter().Deserialize(decompressor);
}