Floating-Point Roundoff & Assembly
CS 301 Lecture, Dr. Lawlor, 2005/11/18
Roundoff & how to avoid it
Floating-point only keeps a certain number of digits around.
This can lead to some really weird results. For example,
int foo(void) {
double x=0.123456789;
printf("x=%.10f \n",x);
return 0;
}
This prints out exactly what you'd expect-- x=0.1234567890
But if we add something and then subtract it from x, like this:
int foo(void) {
double y=1.0e10;
double x=0.123456789+y-y;
printf("x=%.10f \n",x);
return 0;
}
We *don't* get the same result, even though mathematically x should be unchanged. The problem is that
10000000000.0
+ 0.123456789
----------------------
10000000000.123456789
If we now round to (say) 17 significant digits, we get just 10000000000.123457; in other words, we've just rounded off the last three digits! When we subtract off 1.0e10 again, those lost digits don't magically grow back, so
double y=1.0e10;
double x=0.123456789+y-y;
This gives x=0.1234569550 on my machine--we've lost digits to
roundoff! And it gets worse--if y=1.0e20, then x=0.
However, note that if we do the additions in another order, like this:
double x=0.123456789+(y-y);
Now because (y-y) is exactly zero, we get the right answer, x=0.123456789. This is super funky, because the
*order* matters! In other words, adding
parenthesis can significantly change the result.
In general, roundoff is a problem when:
- You care about lots of digits in the answer, AND
- You mix huge and tiny values in the same computation--the huge values will overwhelm the tiny values, losing digits.
Because of roundoff, you can't
freely perform mathematical transformations, like turning (a+b)+c into
a+(b+c), because these have slightly different roundoff behavior.
Fighting Roundoff
Roundoff happens in real life. For example, let's say you're trying to find the average color of a 10,000x10,000 image:
int foo(void) {
enum {n=10000}; /* size of image */
float sum=0.0;
for (int y=0;y<n;y++)
for (int x=0;x<n;x++)
sum+=0.1234;
float average=sum/(n*n);
printf("Average: %.10f\n",average);
return 0;
}
Without roundoff, this would always print out 0.1234. But due to roundoff, the result is 0.02097152, which is totally wrong.
What's happening is that eventually "sum" gets so big, that adding
0.1234 doesn't actually do anything--the effect of 0.1234 is just
rounded off!
There are two natural solutions to roundoff:
Of course, you can always do both--use bigger datatypes, and use them in a roundoff-friendly way.
Floating-Point Assembly
Normal machines, like the PowerPC, just have a dedicated set of
floating-point registers and instructions--in other words, everything
works in exactly the same way as with integers. For example, this
PowerPC assembly subroutine treats its integer argument as a double
pointer to load up its argument into floating-point register 0 using
lfd (load floating-point double), and then squares its argument using
fmul (floating-point multiply).
square_it:
lfd %f0,(%r3)
fmul %f1,%f0,%f0
blr
The x86, by contrast, has a really funky floating-point assembly.
Instead of normal registers, the x86 keeps a "stack" of 8 slots.
This has *nothing* to do with the hardware stack; they're dedicated
registers, not memory locations. When you load up a number, it
goes on the top of the stack. To return a value, you just leave
it on top of the stack. So the simplest x86 assembly program is
this:
(Executable NetRun Link)
bar:
fldz
ret
"fldz" (floating-point load zero) pushes 0.0 onto the floating point
stack. The value on top of the stack is the floating-point return
value (the equivalent of the integer register eax). So this
assembly is equivalent to "double bar(void) {return 0.0;}".
To do arithmetic, like adding two numbers using "faddp" (floating-point add with pop), the hardware takes the
inputs from the top two locations on the stack, and leaves the result
sitting on top of the stack. So you can add two numbers like this:
(Executable NetRun Link)
bar:
fld1 # load 1.0
fldpi # load pi (3.141592...)
faddp # Pops above two, pushes sum
ret
There are alternate flavors of the arithmetic instructions that pull
one source value from memory or from deep in the stack (using the funky
syntax "%st" to refer to the top of the floating-point stack, or
"%st(3)" to refer to the 3rd element in the stack), or to swap, rotate,
and otherwise shuffle the values in the stack. But every
operation still needs to take at least one input from the top of the
stack, which means real programs end up spending a lot of time doing
extra loads and stores, or stack shuffles, to find the values they need.