I am attempting to compress a large amount of data, sometimes in the region of 100GB, when i run the routine i have written it appears the file comes out exactly the same size as the previous size. Has anyone else had this issue with the GZipStream?
My code is as follows:
byte[] buffer = BitConverter.GetBytes(StreamSize);
FileStream LocalUnCompressedFS = File.OpenWrite(ldiFileName);
LocalUnCompressedFS.Write(buffer, 0, buffer.Length);
GZipStream LocalFS = new GZipStream(LocalUnCompressedFS, CompressionMode.Compress);
buffer = new byte[WriteBlock];
UInt64 WrittenBytes = 0;
while (WrittenBytes + WriteBlock < StreamSize)
{
fromStream.Read(buffer, 0, (int)WriteBlock);
LocalFS.Write(buffer, 0, (int)WriteBlock);
WrittenBytes += WriteBlock;
OnLDIFileProgress(WrittenBytes, StreamSize);
if (Cancel)
break;
}
if (!Cancel)
{
double bytesleft = StreamSize - WrittenBytes;
fromStream.Read(buffer, 0, (int)bytesleft);
LocalFS.Write(buffer, 0, (int)bytesleft);
WrittenBytes += (uint)bytesleft;
OnLDIFileProgress(WrittenBytes, StreamSize);
}
LocalFS.Close();
fromStream.Close();
The StreamSize is an 8 byte UInt64 value that holds the size of the file. i write these 8 bytes raw to the start of the file so i know the original file size. Writeblock has the value of 32kb (32768 bytes). fromStream is the stream to take data from, in this instance, a FileStream. Is the 8 bytes infront of the compressed data going to cause an issue?
I ran a test using the following code for compression and it ran without issue on a 7GB and 12GB file (both known beforehand to compress "well"). Does this version work for you?
const string toCompress = @"input.file";
var buffer = new byte[1024*1024*64];
using(var compressing = new GZipStream(File.OpenWrite(@"output.gz"), CompressionMode.Compress))
using(var file = File.OpenRead(toCompress))
{
var bytesRead = 0;
while(bytesRead < buffer.Length)
{
bytesRead = file.Read(buffer, 0, buffer.Length);
compressing.Write(buffer, 0, buffer.Length);
}
}
Have you checked out the documentation?
The GZipStream class cannot decompress data that results in over 8 GB of uncompressed data.
You probably need to find a different library that will support your needs or attempt to break your data up into <=8GB
chunks that can safely be "sewn" back together.