I need to calculate checksums of quite large files (gigabytes). This can be accomplished using the following method:
private byte[] calcHash(string file)
{
System.Security.Cryptography.HashAlgorithm ha = System.Security.Cryptography.MD5.Create();
FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read);
byte[] hash = ha.ComputeHash(fs);
fs.Close();
return hash;
}
However, the files are normally written just beforehand in a buffered manner (say writing 32mb's at a time). I am so convinced that I saw an override of a hash function that allowed me to calculate a MD5 (or other) hash at the same time as writing, ie: calculating the hash of one buffer, then feeding that resulting hash into the next iteration.
Something like this: (pseudocode-ish)
byte [] hash = new byte [] { 0,0,0,0,0,0,0,0 };
while(!eof)
{
buffer = readFromSourceFile();
writefile(buffer);
hash = calchash(buffer, hash);
}
hash is now sililar to what would be accomplished by running the calcHash function on the entire file.
Now, I can't find any overrides like that in the.Net 3.5 Framework, am I dreaming ? Has it never existed, or am I just lousy at searching ? The reason for doing both writing and checksum calculation at once is because it makes sense due to the large files.
You use the TransformBlock
and TransformFinalBlock
methods to process the data in chunks.
// Init
MD5 md5 = MD5.Create();
int offset = 0;
// For each block:
offset += md5.TransformBlock(block, 0, block.Length, block, 0);
// For last block:
md5.TransformFinalBlock(block, 0, block.Length);
// Get the has code
byte[] hash = md5.Hash;
Note: It works (at least with the MD5 provider) to send all blocks to TransformBlock
and then send an empty block to TransformFinalBlock
to finalise the process.