Floating point versus fixed point: what are the pros/cons?

jokoon picture jokoon · Sep 11, 2010 · Viewed 15.2k times · Source

Floating point type represents a number by storing its significant digits and its exponent separately on separate binary words so it fits in 16, 32, 64 or 128 bits.

Fixed point type stores numbers with 2 words, one representing the integer part, another representing the part past the radix, in negative exponents, 2^-1, 2^-2, 2^-3, etc.

Float are better because they have wider range in an exponent sense, but not if one wants to store number with more precision for a certain range, for example only using integer from -16 to 16, thus using more bits to hold digits past the radix.

In terms of performances, which one has the best performance, or are there cases where some is faster than the other ?

In video game programming, does everybody use floating point because the FPU makes it faster, or because the performance drop is just negligible, or do they make their own fixed type ?

Why isn't there any fixed type in C/C++ ?

Answer

Ben Voigt picture Ben Voigt · Sep 11, 2010

That definition covers a very limited subset of fixed point implementations.

It would be more correct to say that in fixed point only the mantissa is stored and the exponent is a constant determined a-priori. There is no requirement for the binary point to fall inside the mantissa, and definitely no requirement that it fall on a word boundary. For example, all of the following are "fixed point":

  • 64 bit mantissa, scaled by 2-32 (this fits the definition listed in the question)
  • 64 bit mantissa, scaled by 2-33 (now the integer and fractional parts cannot be separated by an octet boundary)
  • 32 bit mantissa, scaled by 24 (now there is no fractional part)
  • 32 bit mantissa, scaled by 2-40 (now there is no integer part)

GPUs tend to use fixed point with no integer part (typically 32-bit mantissa scaled by 2-32). Therefore APIs such as OpenGL and Direct3D often use floating-point types which are capable of holding these values. However, manipulating the integer mantissa is often more efficient so these APIs allow specifying coordinates (in texture space, color space, etc) this way as well.

As for your claim that C++ doesn't have a fixed point type, I disagree. All integer types in C++ are fixed point types. The exponent is often assumed to be zero, but this isn't required and I have quite a bit of fixed-point DSP code implemented in C++ this way.