I am struggeling with floating point arithmetic, because I really want to understand this topic!
I know that the numbers can be represented in scientific notation.
So for both numbers the exponent should look like:
Denormalized Number: 11....11 so (1+1/2 + 1/2^2 + ... + 1/2^52)*2^1023
Normalized Number: 11....11 so (1+1/2 + 1/2^2 + ... + 1/2^52)*2^1024
However, I am not sure if this is correct?
I really would appreciate your answer!
PS.: On wikipedia the number is given! However, I do not know how they came up with that...
As you know, the double-precision format looks like this:
The key to understanding denormalized numbers is that they are not actually floating-point numbers but instead use a fixed-point micro-format using the representations that are not used in the 'normal' format.
Normal floating-point numbers are of the form: m*2^e
where e
is found by subtracting the bias from the exponent field above, and m
is a number between 1 and 2, where the bits after the 'binary' point are given by the fraction above. The 1 in front of the binary point is not stored, because it is known to be always 1. The exponent field has a value from 1 to 2046. The values 0 (all zeroes) and 2047 (all ones) are reserved for special uses.
All ones in the exponent field means we have either an infinity or a NaN (Not-a-Number).
All zeroes means we're dealing with denormal floating-point numbers. These are still of the same form, m*2^e
, but the values of m
and e
are derived differently. m
is now a number between 0 and 1, so there is a 0 in front of the binary point instead of a 1 for normal numbers. e
always has the same value: -1022. So the exponent is a constant, which is why I called it a fixed-point format earlier.
So, the largest possible values for each are: