One way hash (not for crypto/security), use SHA256 (not MD5, SHA-1)?

charley picture charley · Jun 26, 2011 · Viewed 9k times · Source

On a new system, we require a one-way-hash to compute a digital signature from binary input (e.g., a kilobyte of text, or larger text-and-binary files). The need is similar to how Scons (build system) hashes command-lines and source files, and how Git (version control system) hashes files to compute a signature for storage/synchronization.

Recall that Scons uses MD5, and Git uses SHA-1.

While MD5 and SHA-1 have been "broken", neither Scons nor Git are using their hashes specifically for security (e.g., it's not to store passwords), so general practice still considers those algorithms acceptable for that usage. (Of course, this is partially a rationalization due to legacy adoption.)

QUESTION: Would you use SHA256 (not MD5 nor SHA-1) for a (non-crypto/security) one-way hash in a new system?

The concerns are:

  1. MD5 and SHA-1 have a long history of adoption
  2. SHA256 is relatively new (not as much history), but seems to be currently recommended for new work (but "stronger" algorithm strength is not specifically required for my application)
  3. SHA256 is more time-expensive to compute
  4. SHA256 produces a longer key (these will be used as dir/file names, and stored within index files), but I suppose I could truncate the produced key (hash is less strong, but should be sufficient), or just assume storage is cheap and file systems can handle it.

I'd be particularly interested in an answer consistent with the Scons or Git communities saying, "We'll keep ours forever!" or "We want to move to a new hash as soon as practical!" (I'm not sure what their plans are?)

Answer

vcsjones picture vcsjones · Jun 26, 2011

Yes, I would use SHA-256. SHA-256 had a lot more than security purposes in mind; in fact one of the reasons that SHA1 needed to be replaced was for the very reason you need a hash function. A hash algorithm produces a finite site output; while having an undetermined amount of input. Eventually there will be a collision. The larger the output; the less likely of a collision (when using a proper hash algorithm).

Git went with SHA1 because they use it as file names; and they wanted it to be small and compact. SHA256 produces a much larger digest; consuming more disk space and more data to transmit over the wire. This question specifically addresses what would happen if git were to encounter collisions.

To look at your points:

  1. SHA256 has been in the wild long enough that if there were problems; we should have seen them by now.
  2. It isn't "stronger" per-se, it's less likely to produce a collision (if that is your criteria for stronger; then yes it is stronger).
  3. SHA-256 is slower; yes. Much slower? Depends on what your needs are. For 95% of people; it's performance is acceptable assuming you're using a proper implementation.
  4. In general, truncating the hash of SHA2 is an okay thing to do.