Weird Integer boxing in Java

Joel picture Joel · Jun 28, 2010 · Viewed 19.7k times · Source

I just saw code similar to this:

public class Scratch
{
    public static void main(String[] args)
    {
        Integer a = 1000, b = 1000;
        System.out.println(a == b);

        Integer c = 100, d = 100;
        System.out.println(c == d);
    }
}

When ran, this block of code will print out:

false
true

I understand why the first is false: because the two objects are separate objects, so the == compares the references. But I can't figure out, why is the second statement returning true? Is there some strange autoboxing rule that kicks in when an Integer's value is in a certain range? What's going on here?

Answer

Jon Skeet picture Jon Skeet · Jun 28, 2010

The true line is actually guaranteed by the language specification. From section 5.1.7:

If the value p being boxed is true, false, a byte, a char in the range \u0000 to \u007f, or an int or short number between -128 and 127, then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.

The discussion goes on, suggesting that although your second line of output is guaranteed, the first isn't (see the last paragraph quoted below):

Ideally, boxing a given primitive value p, would always yield an identical reference. In practice, this may not be feasible using existing implementation techniques. The rules above are a pragmatic compromise. The final clause above requires that certain common values always be boxed into indistinguishable objects. The implementation may cache these, lazily or eagerly.

For other values, this formulation disallows any assumptions about the identity of the boxed values on the programmer's part. This would allow (but not require) sharing of some or all of these references.

This ensures that in most common cases, the behavior will be the desired one, without imposing an undue performance penalty, especially on small devices. Less memory-limited implementations might, for example, cache all characters and shorts, as well as integers and longs in the range of -32K - +32K.