What is the fastest way to create a checksum for large files in C#

crono picture crono · Jul 24, 2009 · Viewed 104.5k times · Source

I have to sync large files across some machines. The files can be up to 6GB in size. The sync will be done manually every few weeks. I cant take the filename into consideration because they can change anytime.

My plan is to create checksums on the destination PC and on the source PC and then copy all files with a checksum, which are not already in the destination, to the destination. My first attempt was something like this:

using System.IO;
using System.Security.Cryptography;

private static string GetChecksum(string file)
{
    using (FileStream stream = File.OpenRead(file))
    {
        SHA256Managed sha = new SHA256Managed();
        byte[] checksum = sha.ComputeHash(stream);
        return BitConverter.ToString(checksum).Replace("-", String.Empty);
    }
}

The Problem was the runtime:
- with SHA256 with a 1,6 GB File -> 20 minutes
- with MD5 with a 1,6 GB File -> 6.15 minutes

Is there a better - faster - way to get the checksum (maybe with a better hash function)?

Answer

Anton Gogolev picture Anton Gogolev · Jul 24, 2009

The problem here is that SHA256Managed reads 4096 bytes at a time (inherit from FileStream and override Read(byte[], int, int) to see how much it reads from the filestream), which is too small a buffer for disk IO.

To speed things up (2 minutes for hashing 2 Gb file on my machine with SHA256, 1 minute for MD5) wrap FileStream in BufferedStream and set reasonably-sized buffer size (I tried with ~1 Mb buffer):

// Not sure if BufferedStream should be wrapped in using block
using(var stream = new BufferedStream(File.OpenRead(filePath), 1200000))
{
    // The rest remains the same
}