What is the fastest hash algorithm to check if two files are equal?

eflles picture eflles · Nov 19, 2009 · Viewed 68k times · Source

What is the fastest way to create a hash function which will be used to check if two files are equal?

Security is not very important.

Edit: I am sending a file over a network connection, and will be sure that the file on both sides are equal

Answer

Aaron Digulla picture Aaron Digulla · Nov 19, 2009

Unless you're using a really complicated and/or slow hash, loading the data from the disk is going to take much longer than computing the hash (unless you use RAM disks or top-end SSDs).

So to compare two files, use this algorithm:

  • Compare sizes
  • Compare dates (be careful here: this can give you the wrong answer; you must test whether this is the case for you or not)
  • Compare the hashes

This allows for a fast fail (if the sizes are different, you know that the files are different).

To make things even faster, you can compute the hash once and save it along with the file. Also save the file date and size into this extra file, so you know quickly when you have to recompute the hash or delete the hash file when the main file changes.