# Optimization: Loop Interchange & Floating-Point

CS 301 Lecture, Dr. Lawlor, 2005/11/14

Memory accesses have performance that's:
• Good *if* you're accessing a small buffer, because the buffer will be copied to fast cache memory.
• Good *if* you're accessing sequentially, because then the memory chips don't have to change rows.
• Terrible *if* you're accessing a big buffer (too big to fit in cache) in a nonsequential way.
So if you're getting bad performance, you can either:
• Make things smaller (to try to fit in cache)
• Make your accesses more regular (to get sequential access)
For example, this program has terrible performance-- 75ns/pixel:
`enum {sx=512};enum {sy=512};double img[sx*sy];int inc_img(void) {        for (int x=0;x<sx;x++)	for (int y=0;y<sy;y++)		img[x+y*sy]++;	return 0;}int foo(void) {	double t=time_function(inc_img);	printf("%.3f ns/pixel\n",t/(sx*sy)*1.0e9);	return 0;}`
We can speed the program up by either:
• Making "img" smaller, by changing it to a "char" (1 byte) instead of "double" (8 bytes).  This speeds the program up to 6ns/pixel (try it!).
• Making the access more regular, by either interchanging the x and y loops, or by changing how the array is indexed (so it's "img[x*sy+y]").  This speeds the program up to 4ns/pixel.
• Doing both--small data with regular access.  This speeds the program up to about 1ns/pixel--a 70x improvement!

## Floating-Point Numbers

Floating-point numbers nowadays have roughly the same performance as integers: addition takes a little over a nanosecond (slightly slower than integer addition); multiplication takes a few nanoseconds; and division takes a dozen or more nanoseconds.  That is, floats are now cheap, and you can consider using floats for all sorts of stuff--even when you don't care about fractions.

The sizes of the floating-point types are:

 C Datatype Size Approx. Precision Approx. Range float 4 bytes (everywhere) 1.0x10-7 1038 double 8 bytes (everywhere) 2.0x10-15 10308 long double 12-16 bytes (if it exists) 2.0x10-20 104932

A "float" as as defined by IEEE Standard 754 consists of three bitfields:
 Sign Exponent Mantissa (or Fraction) 1 bit--   0 for positive   1 for negative 8 bits--   127 means 20   137 means 210 23 bits-- a binary fraction.

The hardware interprets a float as having the value:

value = (-1) sign * 2 (exponent-127) * 1.fraction

Note that the mantissa has an implicit leading 1 applied (in normal cases).   There are other special interpretations if the exponent is 0 (for 0 or denormalized numbers) or 255 (for infinities and NaNs).

For example, the value "8" would be stored with sign bit 0, exponent 130 (==3+127), and mantissa 000... (without the leading 1), since:

8 = (-1) 0 * 2 (130-127) * 1.0000....

If we want to take the log-base-2 of a number, we can actually do it by just looking at the exponent bits inside the "float" representation of the number.  Here's the code:
`void print_log2(int i) {	float f=i; /* turn number into a float */	int v=*(int *)&f; /* look at bits of float representation */	int exponent=(v>>23)&0xff; /* extract IEEE exponent field */	exponent-=127; /* remove exponent bias */	printf(" i=%d   float bit=0x%08X   exp=%d\n",		i,v,exponent);}`