Why does this random value have a 25/75 distribution instead of 50/50?

gvlasov picture gvlasov · Dec 23, 2014 · Viewed 9.2k times · Source

Edit: So basically what I'm trying to write is a 1 bit hash for double.

I want to map a double to true or false with a 50/50 chance. For that I wrote code that picks some random numbers (just as an example, I want to use this on data with regularities and still get a 50/50 result), checks their last bit and increments y if it is 1, or n if it is 0.

However, this code constantly results in 25% y and 75% n. Why is it not 50/50? And why such a weird, but straight-forward (1/3) distribution?

public class DoubleToBoolean {
    @Test
    public void test() {

        int y = 0;
        int n = 0;
        Random r = new Random();
        for (int i = 0; i < 1000000; i++) {
            double randomValue = r.nextDouble();
            long lastBit = Double.doubleToLongBits(randomValue) & 1;
            if (lastBit == 1) {
                y++;
            } else {
                n++;
            }
        }
        System.out.println(y + " " + n);
    }
}

Example output:

250167 749833

Answer

harold picture harold · Dec 23, 2014

Because nextDouble works like this: (source)

public double nextDouble()
{
    return (((long) next(26) << 27) + next(27)) / (double) (1L << 53);
}

next(x) makes x random bits.

Now why does this matter? Because about half the numbers generated by the first part (before the division) are less than 1L << 52, and therefore their significand doesn't entirely fill the 53 bits that it could fill, meaning the least significant bit of the significand is always zero for those.


Because of the amount of attention this is receiving, here's some extra explanation of what a double in Java (and many other languages) really looks like and why it mattered in this question.

Basically, a double looks like this: (source)

double layout

A very important detail not visible in this picture is that numbers are "normalized"1 such that the 53 bit fraction starts with a 1 (by choosing the exponent such that it is so), that 1 is then omitted. That is why the picture shows 52 bits for the fraction (significand) but there are effectively 53 bits in it.

The normalization means that if in the code for nextDouble the 53rd bit is set, that bit is the implicit leading 1 and it goes away, and the other 52 bits are copied literally to the significand of the resulting double. If that bit is not set however, the remaining bits must be shifted left until it becomes set.

On average, half the generated numbers fall into the case where the significand was not shifted left at all (and about half those have a 0 as their least significant bit), and the other half is shifted by at least 1 (or is just completely zero) so their least significant bit is always 0.

1: not always, clearly it cannot be done for zero, which has no highest 1. These numbers are called denormal or subnormal numbers, see wikipedia:denormal number.