I'm trying to convert an int
into a custom float, in which the user specifies the amount of bits reserved for the exp and mantissa, but I don't understand how the conversion works. My function takes in an int value and and int exp to represent the number (value * 2^exp) i.e value = 12, exp = 4, returns 192. but I don't understand the process I need to do to change these. I've been looking at this for days and playing with IEEE converter web apps but I just don't understand what the normalization process is. Like I see that its "move the binary point and adjust the exponent" but I have no idea what this means, can anyone give me an example to go off of? Also I don't understand what the exponent bias is. The only info I have is that you just add a number to your exponent but I don't understand why. I've been searching Google for an example I can understand but this just isn't making any sense to me
A floating point number is normalized when we force the integer part of its mantissa to be exactly 1
and allow its fraction part to be whatever we like.
For example, if we were to take the number 13.25
, which is 1101.01
in binary, 1101
would be the integer part and 01
would be the fraction part.
I could represent 13.25
as 1101.01*(2^0)
, but this isn't normalized because the integer part is not 1
. However, we are allowed to shift the mantissa to the right one digit if we increase the exponent by 1:
1101.01*(2^0)
= 110.101*(2^1)
= 11.0101*(2^2)
= 1.10101*(2^3)
This representation 1.10101*(2^3)
is the normalized form of 13.25
.
That said, we know that normalized floating point numbers will always come in the form 1.fffffff * (2^exp)
For efficiency's sake, we don't bother storing the 1
integer part in the binary representation itself, we just pretend it's there. So if we were to give your custom-made float type 5 bits for the mantissa, we would know the bits 10100
would actually stand for 1.10100
.
Here is an example with the standard 23-bit mantissa:
As for the exponent bias, let's take a look at the standard 32-bit float
format, which is broken into 3 parts: 1 sign bit, 8 exponent bits, and 23 mantissa bits:
s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm
The exponents 00000000
and 11111111
have special purposes (like representing Inf
and NaN
), so with 8 exponent bits, we could represent 254 different exponents, say 2^1
to 2^254
, for example. But what if we want to represent 2^-3
? How do we get negative exponents?
The format fixes this problem by automatically subtracting 127 from the exponent. Therefore:
0000 0001
would be 1 -127 = -126
0010 1101
would be 45 -127 = -82
0111 1111
would be 127-127 = 0
1001 0010
would be 136-127 = 9
This changes the exponent range from 2^1 ... 2^254
to 2^-126 ... 2^+127
so we can represent negative exponents.