Multi-Precision Arithmetic

CS 301 Lecture, Dr. LawlorReminder:
CF--The "carry flag". Set to indicate the bit that carries out of an addition or subtraction. For signed numbers, this doesn't really indicate a problem, but for unsigned numbers, this carry indicates an overflow. Can be used by the "jc" (jump if carry flag is set) instruction. Set by all the arithmetic instructions. Can be added into another arithmetic operation with "adc" (add with carry), which computes:

output register = output register + input register + CF
new CF = carry out of the above addition

For example, you can preserve the bit overflowing out of a big add (between ecx and ecx) like this:

mov ecx, 0x8000ff00
add ecx, ecx
mov eax,0
adc eax,eax ; Adds eax, eax, and the carry flag together

Here's a C++ example:

volatile int lo,hi;

int foo(void) {
	lo=0xffffffff;
	hi=0xabc;
	__asm__(" add $1,lo \n");
	__asm__(" adc $0,hi \n");
	std::cout<<"lo="<<std::hex<<lo<<"\n";
	std::cout<<"hi="<<std::hex<<hi<<"\n";
	return 0;
}

(Try this in NetRun now!

"adc" is used in the compiler's implementation of the 64-bit "long long" datatype on 32-bit machines.

long long v;

void add_to_v(void) {
	v+=3;
}

(Try this in NetRun now!)

On a 64-bit machine, this is just an "add". On a 32-bit machine, it's an add followed by an add with carry.

000000c4 <add_to_v()>:
  c4:	83 05 00 00 00 00 03 	add    DWORD PTR ds:0x0,0x3
			c6: R_386_32	v
  cb:	83 15 04 00 00 00 00 	adc    DWORD PTR ds:0x4,0x0
			cd: R_386_32	v
  d2:	c3                   	ret

Multi-Precision Arithmetic

You can do register-register addition in a single instruction (add). But if you want to add numbers bigger than the registers, you need to break up the addition into register-sized pieces, ordered from lowest value to highest. You'd also have to make sure the carry bits will actually pass between the pieces.   To add a two-int multiprecision number, then, you'd

    add loA,loB
    adc hiA,hiB

This way, the carry bit coming out of the low addition would be read by the higher addition.

To do this "for real", there are several excellent libraries, such as GMP.

Fast Exponentiation Trick

Say you want to compute x to the power y. This is just x*x*x*...*x, with a total of y copies of x:

The obvious way to do this is really quite slow for big values of y:
    int prod=1;
    for (int i=1;i<=y;i++) prod*=x;
    return prod;

But it's *exponentially* faster to compute x raised to the powers of two by repeated squaring, then combine those powers of two to get x to the y. For example, you can compute x to the 16th by squaring x four times, like so:
    int x2=x*x;
    int x4=x2*x2;
    int x8=x4*x4;
    int x16=x8*x8;

Harkening back to our bitwise operators, we can just decompose y into the corresponding powers-of-two of x, by looking at the bits of y:

/* Return x raised to the power y */
double mypow(double x,int y) 
{
    double prod=1; /* will hold x to the y power */
    double xpow=x; /* will take powers of x */
    for (unsigned int bit=0;bit<sizeof(int)*8;bit++) {
       int mask=(1<<bit);
       if (y&mask) prod=prod*xpow; /* include this power of x */
       if (y<mask) break; /* no higher powers of x included in y */
       xpow=xpow*xpow; /* find next higher square */
    }
    return prod;
}

(Try this in NetRun now!)

For example, raising 2 to the 100th power takes only 8 iterations with this "fast exponentiation" method, but over 100 iterations with slow exponentiation.

The fast exponentiation trick applies to all sorts of stuff, but for multi-precision numbers, that exponential speedup gets even bigger. There's also an exact analog for multiplication!

Multi-Precision Floating Point

Add with carry works fine for integers, but how do we do multi-precision floating point? There's an interesting old trick that's been described by Dekker (1971), Priest (1991), and Shewchuk (1997), where we first compute an inaccurate sum, then compute the difference:

class float_hilo {
public:
    float hi; /* big bits */
    float lo; /* small bits (that would have been rounded off from hi) */

    /* The magnitude of a must be bigger than the magnitude of b. */
    void from_add(float a,float b) {  
        hi = a + b; /* high bits are easy--just do add normally */
        float should_b = hi - a; /* in a perfect world, this would be b... */
        lo = b - should_b; /* ... but our arithmetic isn't perfect: calculate the difference. */
    }
};

int foo(void) {
	float a=10000000.0, b=0.1234567;
	float f=a+b;
	float_hilo h; h.from_add(a,b);
	std::cout<<setprecision(20);
	std::cout<<" ordinary: "<<f<<"\n";
	std::cout<<" hi: "<<h.hi<<"    lo: "<<h.lo<<"\n";
	return 0;
}

(Try this in NetRun now!)

This results in:

 ordinary: 10000000
 hi: 10000000    lo: 0.12345670163631439209
Program complete.  Return 0 (0x0)

Note that for this example, we've split up the integer and fraction parts. This isn't always the case; for example, changing the above to set a to 1 million leaves a few bits below the decimal point:

 ordinary: 1000000.125
 hi: 1000000.125    lo: -0.0015432983636856079102
Program complete.  Return 0 (0x0)

Setting a to 10000 leaves more bits after the decimal point:

 ordinary: 10000.123046875
 hi: 10000.123046875    lo: 0.00040991604328155517578
Program complete.  Return 0 (0x0)