Memory accesses have performance that's:

- Good *if* you're accessing a small buffer, because the buffer will be copied to fast cache memory.
- Good *if* you're accessing sequentially, because then the memory chips don't have to change rows.
- Terrible *if* you're accessing a big buffer (too big to fit in cache) in a nonsequential way.

- Make things smaller (to try to fit in cache)

- Make your accesses more regular (to get sequential access)

(Executable NetRun Link)

enum {sx=512};We can speed the program up by either:

enum {sy=512};

double img[sx*sy];

int inc_img(void) {

for (int x=0;x<sx;x++)

for (int y=0;y<sy;y++)

img[x+y*sy]++;

return 0;

}

int foo(void) {

double t=time_function(inc_img);

printf("%.3f ns/pixel\n",t/(sx*sy)*1.0e9);

return 0;

}

- Making "img" smaller, by changing it to a "char" (1 byte) instead of "double" (8 bytes). This speeds the program up to 6ns/pixel (try it!).
- Making the access more regular, by either interchanging the x and
y loops, or by changing how the array is indexed (so it's
"img[x*sy+y]"). This speeds the program up to 4ns/pixel.

- Doing both--small data with regular access. This speeds the program up to about 1ns/pixel--a 70x improvement!

The sizes of the floating-point types are:

C Datatype |
Size |
Approx. Precision |
Approx. Range |

float |
4 bytes (everywhere) |
1.0x10^{-7} |
10^{38} |

double |
8 bytes (everywhere) |
2.0x10^{-15} |
10^{308} |

long double |
12-16 bytes (if it exists) |
2.0x10^{-20} |
10^{4932} |

A "float" as as defined by IEEE Standard 754 consists of three bitfields:

Sign |
Exponent |
Mantissa (or Fraction) |

1 bit-- 0 for positive 1 for negative |
8 bits-- 127 means 2 ^{0
} 137 means 2^{10
} |
23 bits-- a binary fraction. |

The hardware interprets a float as having the value:

value = (-1)

Note that the mantissa has an implicit leading 1 applied (in normal cases). There are other special interpretations if the exponent is 0 (for 0 or denormalized numbers) or 255 (for infinities and NaNs).

For example, the value "8" would be stored with sign bit 0, exponent 130 (==3+127), and mantissa 000... (without the leading 1), since:

8 = (-1)

If we want to take the log-base-2 of a number, we can actually do it by just looking at the exponent bits inside the "float" representation of the number. Here's the code:

(Executable NetRun Link)

void print_log2(int i) {This is several times faster than the equivalent "(int)log(i)/log(2.0)", and it's totally portable.

float f=i; /* turn number into a float */

int v=*(int *)&f; /* look at bits of float representation */

int exponent=(v>>23)&0xff; /* extract IEEE exponent field */

exponent-=127; /* remove exponent bias */

printf(" i=%d float bit=0x%08X exp=%d\n",

i,v,exponent);

}

See page 80 of the textbook for piles of examples and a more detailed definition.