Recall from our float bits lecture that floats are stored using 32 perfectly ordinary bits:

Sign |
Exponent |
Fraction (or
"Mantissa") |

1 bit-- 0 for positive 1 for negative |
8 unsigned bits-- 127 means 2 ^{0
} 137 means 2^{10
} |
23 bits-- a binary fraction. Don't forget the implicit leading 1! |

The correct way to see the bits inside a float is to use an "unholy union":

union unholy_t { /* a union between a float and an integer */For example, we can use integer bitwise operations to zero out the float's sign bit, making a quite cheap floating-point absolute value operation:

public:

float f;

int i;

};

int foo(void) {

unholy_t unholy;

unholy.f=3.0; /* put in a float */

return unholy.i; /* take out an integer */

}

float val=-3.1415;Back before SSE, floating point to integer conversion in C++ was really really slow. The problem is that the same x86 FPU control word bits affect rounding both for float operations like addition and for float-to-int conversion. For example, this float-to-int code takes 55ns(!) on a pre-SSE Pentium III:

int foo(void) {

unholy_t unholy;

unholy.f=val; /* put in a negative float */

unholy.i=unholy.i&0x7fFFffFF; /* mask off the float's sign bit */

return unholy.f; /* now the float is positive! */

}

float val=+3.1415;The problem is evident in the assembly code--you've got to save the old control word out to memory, switch its rounding mode to integer, load the new control word, do the integer conversion, and finally load the original control word to resume normal operation.

int foo(void) {

return (int)(val+0.0001);

}

But our unholy union to the rescue! If you add a value like 1<<23 to a float, the floating-point hardware will round off all the bits after the decimal point, and shift the integer value of the float down into the low bits. We can then extract those bits with the float-to-int union above, mask away the exponent, and we've sped up float-to-int conversion by about 6 fold.

union unholy_t { /* a union between a float and an integer */This "fast float-to-integer trick" has been independently discovered by many smart people, including:

public:

float f;

int i;

};

float val=+3.1415;

int foo(void) {

unholy_t unholy;

unholy.f=val+(1<<23); /* scrape off the fraction bits with the weird constant */

return unholy.i&0x7FffFF; /* mask off the float's sign and exponent bits */

}

- Chris Hecker "Let's get to the (floating) point", includes good history of floating-point for 3D game programming.
- Mike Herf "know your FPU".

- Sree Kotay "Fixing Floating Fast".