int foo(void) {This prints out exactly what you'd expect-- x=0.1234567890

double x=0.123456789;

printf("x=%.10f \n",x);

return 0;

}

But if we add something and then subtract it from x, like this:

int foo(void) {We *don't* get the same result, even though mathematically x should be unchanged. The problem is that

double y=1.0e10;

double x=0.123456789+y-y;

printf("x=%.10f \n",x);

return 0;

}

10000000000.0If we now round to (say) 17 significant digits, we get just 10000000000.123457; in other words, we've just rounded off the last three digits! When we subtract off 1.0e10 again, those lost digits don't magically grow back, so

+ 0.123456789

----------------------

10000000000.123456789

double y=1.0e10;This gives x=0.1234569550 on my machine--we've lost digits to roundoff! And it gets worse--if y=1.0e20, then x=0. However, note that if we do the additions in another order, like this:

double x=0.123456789+y-y;

double x=0.123456789+(y-y);Now because (y-y) is exactly zero, we get the right answer, x=0.123456789. This is super funky, because the *order* matters! In other words, adding parenthesis can significantly change the result.

In general, roundoff is a problem when:

- You care about lots of digits in the answer, AND

- You mix huge and tiny values in the same computation--the huge values will overwhelm the tiny values, losing digits.

int foo(void) {Without roundoff, this would always print out 0.1234. But due to roundoff, the result is 0.02097152, which is totally wrong.

enum {n=10000}; /* size of image */

float sum=0.0;

for (int y=0;y<n;y++)

for (int x=0;x<n;x++)

sum+=0.1234;

float average=sum/(n*n);

printf("Average: %.10f\n",average);

return 0;

}

What's happening is that eventually "sum" gets so big, that adding 0.1234 doesn't actually do anything--the effect of 0.1234 is just rounded off!

There are two natural solutions to roundoff:

- Use a bigger datatype, like "double" instead of float, or "long
double" instead of double. Using "double" above gives the right
answer (unless the image gets really really big). Using a bigger type
costs memory (and hence cache performance), but it's definitely the
easiest and most common way to avoid roundoff.

- Do the operations in a more roundoff-friendly order. For
example, if we keep the (small) sum for each row protected in another
variable, and only add it to the (big, bad) total sum at the end of the
row, we have significantly less roundoff error:

int foo(void) {

enum {n=10000}; /* size of image */

float sum=0.0;

for (int y=0;y<n;y++) {

float sumRow=0.0;

for (int x=0;x<n;x++)

sumRow+=0.1234;

sum+=sumRow;

}

float average=sum/(n*n);

printf("Average: %.10f\n",average);

return 0;

}

square_it:

lfd %f0,(%r3)

fmul %f1,%f0,%f0

blr

The x86, by contrast, has a really funky floating-point assembly. Instead of normal registers, the x86 keeps a "stack" of 8 slots. This has *nothing* to do with the hardware stack; they're dedicated registers, not memory locations. When you load up a number, it goes on the top of the stack. To return a value, you just leave it on top of the stack. So the simplest x86 assembly program is this:

(Executable NetRun Link)

bar:"fldz" (floating-point load zero) pushes 0.0 onto the floating point stack. The value on top of the stack is the floating-point return value (the equivalent of the integer register eax). So this assembly is equivalent to "double bar(void) {return 0.0;}".

fldz

ret

To do arithmetic, like adding two numbers using "faddp" (floating-point add with pop), the hardware takes the inputs from the top two locations on the stack, and leaves the result sitting on top of the stack. So you can add two numbers like this:

(Executable NetRun Link)

bar:There are alternate flavors of the arithmetic instructions that pull one source value from memory or from deep in the stack (using the funky syntax "%st" to refer to the top of the floating-point stack, or "%st(3)" to refer to the 3rd element in the stack), or to swap, rotate, and otherwise shuffle the values in the stack. But every operation still needs to take at least one input from the top of the stack, which means real programs end up spending a lot of time doing extra loads and stores, or stack shuffles, to find the values they need.

fld1 # load 1.0

fldpi # load pi (3.141592...)

faddp # Pops above two, pushes sum

ret