Which is the first integer that an IEEE 754 float is incapable of representing exactly?

Question 1

Which is the first integer that an IEEE 754 float is incapable of representing exactly?

types floating-point ieee-754

Floomi · Sep 25, 2010 · Viewed 38.4k times · Source

Answer

Answer

2^{mantissa bits + 1} + 1

The +1 in the exponent (mantissa bits + 1) is because, if the mantissa contains abcdef... the number it represents is actually 1.abcdef... × 2^e, providing an extra implicit bit of precision.

Therefore, the first integer that cannot be accurately represented and will be rounded is:
For float, 16,777,217 (2²⁴ + 1).
For double, 9,007,199,254,740,993 (2⁵³ + 1).

>>> 9007199254740993.0
9007199254740992

Question 2

For clarity, if I'm using a language that implements IEE 754 floats and I declare:

float f0 = 0.f;
float f1 = 1.f;

...and then print them back out, I'll get 0.0000 and 1.0000 - exactly.

But IEEE 754 isn't capable of representing all the numbers along the real line. Close to zero, the 'gaps' are small; as you get further away, the gaps get larger.

So, my question is: for an IEEE 754 float, which is the first (closest to zero) integer which cannot be exactly represented? I'm only really concerned with 32-bit floats for now, although I'll be interested to hear the answer for 64-bit if someone gives it!

I thought this would be as simple as calculating 2^{bits_of_mantissa} and adding 1, where bits_of_mantissa is how many bits the standard exposes. I did this for 32-bit floats on my machine (MSVC++, Win64), and it seemed fine, though.

Which is the first integer that an IEEE 754 float is incapable of representing exactly?

Answer

Related questions