Why is MD5'ing a UUID not a good idea?

ryeguy picture ryeguy · Aug 18, 2009 · Viewed 16.8k times · Source

PHP has a uniqid() function which generates a UUID of sorts.

In the usage examples, it shows the following:

$token = md5(uniqid());

But in the comments, someone says this:

Generating an MD5 from a unique ID is naive and reduces much of the value of unique IDs, as well as providing significant (attackable) stricture on the MD5 domain. That's a deeply broken thing to do. The correct approach is to use the unique ID on its own; it's already geared for non-collision.

Why is this true, if so? If an MD5 hash is (almost) unique for a unique ID, then what is wrong from md5'ing a uniqid?

Answer

ConcernedOfTunbridgeWells picture ConcernedOfTunbridgeWells · Aug 18, 2009

A UUID is 128 bits wide and has uniqueness inherent to the way it is generated. A MD5 hash is 128 bits wide and doesn't guarantee uniquess, only a low probablity of collision. The MD5 hash is no smaller than the UUID so it doesn't help with storage.

If you know the hash is from a UUID it is much easier to attack because the domain of valid UUIDs is actually fairly predictable if you know anything about the machine geneerating them.

If you needed to provide a secure token then you would need to use a cryptographically secure random number generator.(1) UUIDs are not designed to be cryptographically secure, only guaranteed unique. A monotonically increasing sequence bounded by unique machine identifiers (typically a MAC) and time is still a perfectly valid UUID but highly predictable if you can reverse engineer a single UUID from the sequence of tokens.

  1. The defining characteristic of a cryptographically secure PRNG is that the result of a given iteration does not contain enough information to infer the value of the next iteration - i.e. there is some hidden state in the generator that is not revealed in the number and cannot be inferred by examining a sequence of numbers from the PRNG.

    If you get into number theory you can find ways to guess the internal state of some PRNGs from a sequence of generated values. Mersenne Twister is an example of such a generator. It has hidden state that it used to get its long period but it is not cryptographically secure - you can take a fairly small sequence of numbers and use that to infer the internal state. Once you've done this you can use it to attack a cryptographic mechanism that depends on keeping that sequence a secret.