Combining MD5 hash values

channel72 picture channel72 · Feb 6, 2010 · Viewed 12.2k times · Source

When calculating a single MD5 checksum on a large file, what technique is generally used to combine the various MD5 values into a single value? Do you just add them together? I'm not really interested in any particular language, library or API which will do this; rather I'm just interested in the technique behind it. Can someone explain how it is done?

Given the following algorithm in pseudo-code:

MD5Digest X
for each file segment F
   MD5Digest Y = CalculateMD5(F)
   Combine(X,Y)

But what exactly would Combine do? Does it add the two MD5 digests together, or what?

Answer

AndiDog picture AndiDog · Feb 6, 2010

In order to calculate MD5 values for files which are too large to fit in memory

With that in mind, you don't want to "combine" two MD5 hashes. With any MD5 implementation, you have a object that keeps the current checksum state. So you can extract the MD5 checksum at any time, which is very handy when hashing two files that share the same beginning. For big files, you just keep feeding in data - there's no difference if you hash the file at once or in blocks, as the state is remembered. In both cases you will get the same hash.