(Executable NetRun Link)

/* A fourth-degree polynomial:

a + b x + c x^2 + d x^3 + e x^4

*/

class polynomial_4 {

public:

double a,b,c,d,e;

polynomial_4() :a(0.2),b(0.3),c(0.4),d(0.5),e(0.1) {}

double evaluate(double x) {

return a+

b*x+

c*pow(x,2)+

d*pow(x,3)+

e*pow(x,4);

}

};

polynomial_4 poly;

double x=0.4;

int foo(void) {

return (int)poly.evaluate(x);

}

It evaluates a polynomial. As written, it takes about 230ns per call, which is pretty darn slow.

The first thing to notice is that "pow" is slow--close to a hundred nanoseconds per call. So we can replace the call to pow with multiplications so evaluate reads:

return a+This brings the time down to 21ns per call--a speedup of over 10x!

b*x+

c*x*x+

d*x*x*x+

e*x*x*x*x;

return a+x*(b+x*(c+x*(d+x*e)));This is indeed easier to write, but it's actually slightly slower--the compiler's already smart enough to optimize away the repeated multiplications into this form.

The compiler won't do the obvious thing and optimize a divide into a multiply-by-inverse, however:

(Executable NetRun Link)

enum {n=1000};This runs in 16ns per element, which is pretty slow for such a simple routine. If we replace the guts with

double vals[n];

double c=0.3;

int foo(void) {

for (int i=0;i<n;i++)

vals[i]=i/c;

return (int)vals[100];

}

double ic=1.0/c;That is, instead of dividing by c, just multiply by 1/c! This now runs in 3ns per element, a 5x speedup.

for (int i=0;i<n;i++)

vals[i]=i*ic;

Why doesn't the compiler do this? The reason is that because of floating-point roundoff, for some numbers, dividing and multiplying by the inverse are not the same thing!

(Executable NetRun Link)

float f=1000000000;On my machine, this prints out

volatile float f_i=1.0/f; /* the "volatile" keeps the compiler honest */

printf(" f_i=%.20f\n",f_i);

printf(" f/f=%.10f; f*fi=%.10f\n",f/f,f*f_i);

f =1000000000.00000000000000000000Because dividing and multiplying by the inverse aren't exactly the same, the compiler won't do it (unless, that is, you explicitly pass a "quick and dirty math" flag, like gcc's "-ffast-math"). But can you get away with it? Often, you certainly can--your input floating point numbers are usually inherently inexact anyway, so adding a small additional error doesn't affect the final results much.

f/f =1.0000000000

f_i=0.00000000099999997172

f*f_i=0.9999999717

abs(const complex<...> &z) {But the implementation in /usr/include/g++/complex is this:

double x = z.real();

double y = z.imag();

return sqrt(x*x+y*y);

}

abs(const complex<double>& z)Why the maddness? Well, say x=z.real() was really really big--like 10 to the 200th power, which (barely!) fits inside a "double". Then x*x would be even bigger--like 10 to the 400th power, which no longer fits inside a "double"! The result of an operation like this is strange, so instead the standard library does the weird scaling by "s", which costs two divides and a multiply (ouch!). The advantage is that the standard "abs" works for numbers of *any* size, even really close to the representable limit.

{

double x = z.real();

double y = z.imag();

const double s = max(abs(x), abs(y));

if (s == 0) // well ...

return 0;

x /= s;

y /= s;

return s * sqrt(x * x + y * y);

}

Of course, in reality, most programs are guaranteed to work with input much much smaller than the representable limit, so you can just use the obvious "abs" above. If you're comparing the result of the "abs" with something (e.g., to check the difference between two vectors), there's another trick--

if (sqrt(x*x+y*y)>bar) ...is exactly equivalent (by squaring both sides, which is OK because the numbers are positive) to

if (x*x+y*y>bar*bar) ...but this version avoids the expensive square root.