I have to sync large files across some machines. The files can be up to 6GB in size. The sync will be done manually every few weeks. I cant take the filename into consideration because they can change anytime.
My plan is to create checksums on the destination PC and on the source PC and then copy all files with a checksum, which are not already in the destination, to the destination. My first attempt was something like this:
using System.IO;
using System.Security.Cryptography;
private static string GetChecksum(string file)
{
using (FileStream stream = File.OpenRead(file))
{
SHA256Managed sha = new SHA256Managed();
byte[] checksum = sha.ComputeHash(stream);
return BitConverter.ToString(checksum).Replace("-", String.Empty);
}
}
The Problem was the runtime:
- with SHA256 with a 1,6 GB File -> 20 minutes
- with MD5 with a 1,6 GB File -> 6.15 minutes
Is there a better - faster - way to get the checksum (maybe with a better hash function)?
The problem here is that SHA256Managed
reads 4096 bytes at a time (inherit from FileStream
and override Read(byte[], int, int)
to see how much it reads from the filestream), which is too small a buffer for disk IO.
To speed things up (2 minutes for hashing 2 Gb file on my machine with SHA256, 1 minute for MD5) wrap FileStream
in BufferedStream
and set reasonably-sized buffer size (I tried with ~1 Mb buffer):
// Not sure if BufferedStream should be wrapped in using block
using(var stream = new BufferedStream(File.OpenRead(filePath), 1200000))
{
// The rest remains the same
}