Edit: So basically what I'm trying to write is a 1 bit hash for double
.
I want to map a double
to true
or false
with a 50/50 chance. For that I wrote code that picks some random numbers (just as an example, I want to use this on data with regularities and still get a 50/50 result), checks their last bit and increments y
if it is 1, or n
if it is 0.
However, this code constantly results in 25% y
and 75% n
. Why is it not 50/50? And why such a weird, but straight-forward (1/3) distribution?
public class DoubleToBoolean {
@Test
public void test() {
int y = 0;
int n = 0;
Random r = new Random();
for (int i = 0; i < 1000000; i++) {
double randomValue = r.nextDouble();
long lastBit = Double.doubleToLongBits(randomValue) & 1;
if (lastBit == 1) {
y++;
} else {
n++;
}
}
System.out.println(y + " " + n);
}
}
Example output:
250167 749833
Because nextDouble works like this: (source)
public double nextDouble()
{
return (((long) next(26) << 27) + next(27)) / (double) (1L << 53);
}
next(x)
makes x
random bits.
Now why does this matter? Because about half the numbers generated by the first part (before the division) are less than 1L << 52
, and therefore their significand doesn't entirely fill the 53 bits that it could fill, meaning the least significant bit of the significand is always zero for those.
Because of the amount of attention this is receiving, here's some extra explanation of what a double
in Java (and many other languages) really looks like and why it mattered in this question.
Basically, a double
looks like this: (source)
A very important detail not visible in this picture is that numbers are "normalized"1 such that the 53 bit fraction starts with a 1 (by choosing the exponent such that it is so), that 1 is then omitted. That is why the picture shows 52 bits for the fraction (significand) but there are effectively 53 bits in it.
The normalization means that if in the code for nextDouble
the 53rd bit is set, that bit is the implicit leading 1 and it goes away, and the other 52 bits are copied literally to the significand of the resulting double
. If that bit is not set however, the remaining bits must be shifted left until it becomes set.
On average, half the generated numbers fall into the case where the significand was not shifted left at all (and about half those have a 0 as their least significant bit), and the other half is shifted by at least 1 (or is just completely zero) so their least significant bit is always 0.
1: not always, clearly it cannot be done for zero, which has no highest 1. These numbers are called denormal or subnormal numbers, see wikipedia:denormal number.