Optimization: Loop Interchange & Floating-Point

CS 301 Lecture, Dr. Lawlor, 2005/11/14

Memory accesses have performance that's:
So if you're getting bad performance, you can either:
For example, this program has terrible performance-- 75ns/pixel:
(Executable NetRun Link)
enum {sx=512};
enum {sy=512};
double img[sx*sy];

int inc_img(void) {
for (int x=0;x<sx;x++)
for (int y=0;y<sy;y++)
img[x+y*sy]++;
return 0;
}

int foo(void) {
double t=time_function(inc_img);
printf("%.3f ns/pixel\n",t/(sx*sy)*1.0e9);
return 0;
}
We can speed the program up by either:

Floating-Point Numbers

Floating-point numbers nowadays have roughly the same performance as integers: addition takes a little over a nanosecond (slightly slower than integer addition); multiplication takes a few nanoseconds; and division takes a dozen or more nanoseconds.  That is, floats are now cheap, and you can consider using floats for all sorts of stuff--even when you don't care about fractions.

The sizes of the floating-point types are:

C Datatype
Size
Approx. Precision
Approx. Range
float
4 bytes (everywhere)
1.0x10-7
1038
double
8 bytes (everywhere)
2.0x10-15
10308
long double
12-16 bytes (if it exists)
2.0x10-20
104932

A "float" as as defined by IEEE Standard 754 consists of three bitfields:
Sign
Exponent
Mantissa (or Fraction)
1 bit--
  0 for positive
  1 for negative
8 bits--
  127 means 20
  137 means 210
23 bits-- a binary fraction.

The hardware interprets a float as having the value:

    value = (-1) sign * 2 (exponent-127) * 1.fraction

Note that the mantissa has an implicit leading 1 applied (in normal cases).   There are other special interpretations if the exponent is 0 (for 0 or denormalized numbers) or 255 (for infinities and NaNs).

For example, the value "8" would be stored with sign bit 0, exponent 130 (==3+127), and mantissa 000... (without the leading 1), since:

    8 = (-1) 0 * 2 (130-127) * 1.0000....

If we want to take the log-base-2 of a number, we can actually do it by just looking at the exponent bits inside the "float" representation of the number.  Here's the code:
(Executable NetRun Link)
void print_log2(int i) {
float f=i; /* turn number into a float */
int v=*(int *)&f; /* look at bits of float representation */
int exponent=(v>>23)&0xff; /* extract IEEE exponent field */
exponent-=127; /* remove exponent bias */
printf(" i=%d float bit=0x%08X exp=%d\n",
i,v,exponent);
}
This is several times faster than the equivalent "(int)log(i)/log(2.0)", and it's totally portable.

See page 80 of the textbook for piles of examples and a more detailed definition.