How do one-way hash functions work? (Edited)

user94154 picture user94154 · Jan 21, 2010 · Viewed 40.4k times · Source

I read the Wikipedia article about md5 hashes but I still can't understand how a hash can't be "reconstituted" back to the original text.

Could someone explain to someone who knows very little about cryptography how this works? What part of the function makes it one-way?

Answer

Pascal Cuoq picture Pascal Cuoq · Jan 21, 2010

Since everyone until now has simply defined what a hash function was, I will bite.

A one-way function is not just a hash function -- a function that loses information -- but a function f for which, given an image y ("SE" or 294 in existing answers), it is difficult to find a pre-image x such that f(x)=y.

This is why they are called one-way: you can compute an image but you can't find a pre-image for a given image.

None of the ordinary hash function proposed until now in existing answers have this property. None of them are one-way cryptographic hash functions. For instance, given "SE", you can easily pick up the input "SXXXE", an input with the property that X-encode("SXXXE")=SE.

There are no "simple" one-way functions. They have to mix their inputs so well that not only you don't recognize the input at all in the output, but you don't recognize another input either.

SHA-1 and MD5 used to be popular one-way functions but they are both nearly broken (specialist know how to create pre-images for given images, or are nearly able to do so). There is a contest underway to choose a new standard one, which will be named SHA-3.

An obvious approach to invert a one-way function would be to compute many images and keep them in a table associating to each image the pre-image that produced it. To make this impossible in practice, all one-way function have a large output, at least 64 bits but possibly much larger (up to, say, 512 bits).

EDIT: How do most cryptographic hash functions work?

Usually they have at their core a single function that does complicated transformations on a block of bits (a block cipher). The function should be nearly bijective (it shouldn't map too many sequences to the same image, because that would cause weaknesses later) but it doesn't have to be exactly bijective. And this function is iterated a fixed number of times, enough to make the input (or any possible input) impossible to recognize.

Take the example of Skein, one of the strong candidates for the SHA-3 context. Its core function is iterated 72 times. The only number of iterations for which the creators of the function know how to sometimes relate the outputs to some inputs is 25. They say it has a "safety factor" of 2.9.