Why does crypt/blowfish generate the same hash with two different salts?

Dereleased picture Dereleased · Feb 9, 2010 · Viewed 13.9k times · Source

This question has to do with PHP's implementation of crypt(). For this question, the first 7 characters of the salt are not counted, so a salt '$2a$07$a' would be said to have a length of 1, as it is only 1 character of salt and seven characters of meta-data.

When using salt strings longer than 22 characters, there is no change in the hash generated (i.e., truncation), and when using strings shorter than 21 characters the salt will automatically be padded (with '$' characters, apparently); this is fairly straightforward. However, if given a salt 20 characters and a salt 21 characters, where the two are identical except for the final character of the 21-length salt, both hashed strings will be identical. A salt 22 characters long, which is identical to the 21 length salt except for the final character, the hash will be different again.

Example In Code:

$foo = 'bar';
$salt_xx = '$2a$07$';
$salt_19 = $salt_xx . 'b1b2ee48991281a439d';
$salt_20 = $salt_19 . 'a';
$salt_21 = $salt_20 . '2';
$salt_22 = $salt_21 . 'b';

var_dump(
    crypt($foo, $salt_19), 
    crypt($foo, $salt_20), 
    crypt($foo, $salt_21), 
    crypt($foo, $salt_22)
);

Will produce:

string(60) "$2a$07$b1b2ee48991281a439d$$.dEUdhUoQXVqUieLTCp0cFVolhFcbuNi"
string(60) "$2a$07$b1b2ee48991281a439da$.UxGYN739wLkV5PGoR1XA4EvNVPjwylG"
string(60) "$2a$07$b1b2ee48991281a439da2.UxGYN739wLkV5PGoR1XA4EvNVPjwylG"
string(60) "$2a$07$b1b2ee48991281a439da2O4AH0.y/AsOuzMpI.f4sBs8E2hQjPUQq"

Why is this?

EDIT:

Some users are noting that there is a difference in the overall string, which is true. In salt_20, offset (28, 4) is da$., while in salt_21, offset (28, 4) is da2.; however, it is important to note that the string generated includes the hash, the salt, as well as instructions to generate the salt (i.e. $2a$07$); the part in which the difference occurs is, in fact, still the salt. The actual hash is unchanged as UxGYN739wLkV5PGoR1XA4EvNVPjwylG.

Thus, this is in fact not a difference in the hash produced, but a difference in the salt used to store the hash, which is precisely the problem at hand: two salts are generating the same hash.

Rembmer: the output will be in the following format:

"$2a$##$saltsaltsaltsaltsaltsaHASHhashHASHhashHASHhashHASHhash"
//                            ^ Hash Starts Here, offset 28,32

where ## is the log-base-2 determining the number of iterations the algorithm runs for

Edit 2:

In the comments, it was requested that I post some additional info, as the user could not reproduce my output. Execution of the following code:

var_dump(
    PHP_VERSION, 
    PHP_OS, 
    CRYPT_SALT_LENGTH, 
    CRYPT_STD_DES, 
    CRYPT_EXT_DES, 
    CRYPT_MD5, 
    CRYPT_BLOWFISH
);

Produces the following output:

string(5) "5.3.0"
string(5) "WINNT"
int(60)
int(1)
int(1)
int(1)
int(1)

Hope this helps.

Answer

Dereleased picture Dereleased · Feb 10, 2010

After some experimentation, I have come to the conclusion that this is due to the way the salt is treated. The salt is not considered to be literal text, but rather to be a base64 encoded string, such that 22 bytes of salt data would actually represent a 16 byte string (floor(22 * 24 / 32) == 16) of salt. The "Gotcha!" with this implementation, though, is that, like Unix crypt, it uses a "non-standard" base64 alphabet. To be exact, it uses this alphabet:

./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789$

The 65th character, '$', is the padding character.

Now, the crypt() function appears to be capable of taking a salt of any length less than or equal to its maximum, and silently handling any inconsistencies in the base64 by discarding any data that doesn't make up another full byte. The crypt function will fail completely if you pass it characters in the salt that are not part of its base64 alphabet, which just confirms this theory of its operation.

Take an imaginary salt '1234'. This is perfectly base64 consistent in that it represents 24 bits of data, so 3 bytes, and does not carry any data that needs to be discarded. This is a salt whose Len Mod 4 is zero. Append any character to that salt, and it becomes a 5 character salt, and Len Mod 4 is now 1. However, this additional character represents only six bits of data, and therefore cannot be transformed into another full byte, so it is discarded.

Thus, for any two salts A and B, where

   Len A Mod 4 == 0 
&& Len B Mod 4 == 1  // these two lines mean the same thing
&& Len B = Len A + 1 // but are semantically important separately
&& A == substr B, 0, Len A

The actual salt used by crypt() to calculate the hash will, in fact, be identical. As proof, I'm including some example PHP code that can be used to show this. The salt constantly rotates in a seminon-random way (based on a randomish segment of the whirlpool hash of the current time to the microsecond), and the data to be hashed (herein called $seed) is simply the current Unix-Epoch time.

$salt = substr(hash('whirlpool',microtime()),rand(0,105),22);
$seed = time();
for ($i = 0, $j = strlen($salt); $i <= $j; ++$i) {
    printf('%02d = %s%s%c',
        $i,
        crypt($seed,'$2a$07$' . substr($salt, 0, $i)),
        $i%4 == 0 || $i % 4 == 1 ? ' <-' : '',
        0x0A
    );
}

And this produces output similar to the following

00 = $2a$07$$$$$$$$$$$$$$$$$$$$$$.rBxL4x0LvuUp8rhGfnEKSOevBKB5V2. <-
01 = $2a$07$e$$$$$$$$$$$$$$$$$$$$.rBxL4x0LvuUp8rhGfnEKSOevBKB5V2. <-
02 = $2a$07$e8$$$$$$$$$$$$$$$$$$$.WEimjvvOvQ.lGh/V6HFkts7Rq5rpXZG
03 = $2a$07$e89$$$$$$$$$$$$$$$$$$.Ww5p352lsfQCWarRIWWGGbKa074K4/.
04 = $2a$07$e895$$$$$$$$$$$$$$$$$.ZGSPawtL.pOeNI74nhhnHowYrJBrLuW <-
05 = $2a$07$e8955$$$$$$$$$$$$$$$$.ZGSPawtL.pOeNI74nhhnHowYrJBrLuW <-
06 = $2a$07$e8955b$$$$$$$$$$$$$$$.2UumGVfyc4SgAZBs5P6IKlUYma7sxqa
07 = $2a$07$e8955be$$$$$$$$$$$$$$.gb6deOAckxHP/WIZOGPZ6/P3oUSQkPm
08 = $2a$07$e8955be6$$$$$$$$$$$$$.5gox0YOqQMfF6FBU9weAz5RmcIKZoki <-
09 = $2a$07$e8955be61$$$$$$$$$$$$.5gox0YOqQMfF6FBU9weAz5RmcIKZoki <-
10 = $2a$07$e8955be616$$$$$$$$$$$.hWHhdkS9Z3m7/PMKn1Ko7Qf2S7H4ttK
11 = $2a$07$e8955be6162$$$$$$$$$$.meHPOa25CYG2G8JrbC8dPQuWf9yw0Iy
12 = $2a$07$e8955be61624$$$$$$$$$.vcp/UGtAwLJWvtKTndM7w1/30NuYdYa <-
13 = $2a$07$e8955be616246$$$$$$$$.vcp/UGtAwLJWvtKTndM7w1/30NuYdYa <-
14 = $2a$07$e8955be6162468$$$$$$$.OTzcPMwrtXxx6YHKtaX0mypWvqJK5Ye
15 = $2a$07$e8955be6162468d$$$$$$.pDcOFp68WnHqU8tZJxuf2V0nqUqwc0W
16 = $2a$07$e8955be6162468de$$$$$.YDv5tkOeXkOECJmjl1R8zXVRMlU0rJi <-
17 = $2a$07$e8955be6162468deb$$$$.YDv5tkOeXkOECJmjl1R8zXVRMlU0rJi <-
18 = $2a$07$e8955be6162468deb0$$$.aNZIHogUlCn8H7W3naR50pzEsQgnakq
19 = $2a$07$e8955be6162468deb0d$$.ytfAwRL.czZr/K3hGPmbgJlheoZUyL2
20 = $2a$07$e8955be6162468deb0da$.0xhS8VgxJOn4skeI02VNI6jI6324EPe <-
21 = $2a$07$e8955be6162468deb0da3.0xhS8VgxJOn4skeI02VNI6jI6324EPe <-
22 = $2a$07$e8955be6162468deb0da3ucYVpET7X/5YddEeJxVqqUIxs3COrdym

The conclusion? Twofold. First, it's working as intended, and second, know your own salt or don't roll your own salt.