GZipStream on large data

Skintkingle picture Skintkingle · May 16, 2012 · Viewed 8.2k times · Source

I am attempting to compress a large amount of data, sometimes in the region of 100GB, when i run the routine i have written it appears the file comes out exactly the same size as the previous size. Has anyone else had this issue with the GZipStream?

My code is as follows:

        byte[] buffer = BitConverter.GetBytes(StreamSize);
        FileStream LocalUnCompressedFS = File.OpenWrite(ldiFileName);
        LocalUnCompressedFS.Write(buffer, 0, buffer.Length);
        GZipStream LocalFS = new GZipStream(LocalUnCompressedFS, CompressionMode.Compress);
        buffer = new byte[WriteBlock];
        UInt64 WrittenBytes = 0;
        while (WrittenBytes + WriteBlock < StreamSize)
        {
            fromStream.Read(buffer, 0, (int)WriteBlock);
            LocalFS.Write(buffer, 0, (int)WriteBlock);
            WrittenBytes += WriteBlock;
            OnLDIFileProgress(WrittenBytes, StreamSize);
            if (Cancel)
                break;
        }
        if (!Cancel)
        {
            double bytesleft = StreamSize - WrittenBytes;
            fromStream.Read(buffer, 0, (int)bytesleft);
            LocalFS.Write(buffer, 0, (int)bytesleft);
            WrittenBytes += (uint)bytesleft;
            OnLDIFileProgress(WrittenBytes, StreamSize);
        }
        LocalFS.Close();
        fromStream.Close();

The StreamSize is an 8 byte UInt64 value that holds the size of the file. i write these 8 bytes raw to the start of the file so i know the original file size. Writeblock has the value of 32kb (32768 bytes). fromStream is the stream to take data from, in this instance, a FileStream. Is the 8 bytes infront of the compressed data going to cause an issue?

Answer

Austin Salonen picture Austin Salonen · May 16, 2012

I ran a test using the following code for compression and it ran without issue on a 7GB and 12GB file (both known beforehand to compress "well"). Does this version work for you?

const string toCompress = @"input.file";
var buffer = new byte[1024*1024*64];

using(var compressing = new GZipStream(File.OpenWrite(@"output.gz"), CompressionMode.Compress))
using(var file = File.OpenRead(toCompress))
{
    var bytesRead = 0;
    while(bytesRead < buffer.Length)
    {
        bytesRead = file.Read(buffer, 0, buffer.Length);
        compressing.Write(buffer, 0, buffer.Length);
    }
}

Have you checked out the documentation?

The GZipStream class cannot decompress data that results in over 8 GB of uncompressed data.

You probably need to find a different library that will support your needs or attempt to break your data up into <=8GB chunks that can safely be "sewn" back together.