Optimization: Speed of Operations

CS 301 Lecture, Dr. Lawlor, 2005/10/31

The simplest possible performance model is the constant model:
t = C
That is, the time t taken to do something is always a constant, C.

The next-simplest model is the linear model:
t(n) = A + B n
For example, the time t(n) taken to execute most loops is just the loop startup time, A, plus some fixed time B for each of the n iterations of the loop. To find A, we can just run a loop with no iterations. To find B, we can run the loop many times (big n), and then B = (t(n)-A)/n.

Here's an example where we use the netrun builtin timing routine "time_function" to experimentally determine A and B:
(Executable NetRun Link)

int rep=0; /* Number of repetitions for do_it to execute*/
int do_it(void) {
	unsigned int i, max=rep, sum=0;
	for (i=0;i<max;i++) sum=sum*(i+1);
	return (int)sum;
}

int foo(void) {
	rep=0;
	double A=time_function(do_it); /* time to do no iterations */
	
	int n=100000;
	rep=n;
	double B=(time_function(do_it)-A)/n; /* time per iteration */
	
	printf("Per call: %.2f ns.  Per iteration: %.2f ns\n",
		A*1.0e9, B*1.0e9);
	return 0;
}

We can now do anything we want inside do_it, and see how fast it is. Here are the results for the default NetRun x86 machine, a 2.8GHz (0.36ns/clock) Pentium 4. The loop startup time A is always about 5 ns.

Time per iteration B	Operations:
0.8ns	Add, Subtract, AND, OR, XOR, NOT, bitshift (by a constant)
1.0ns	bitshift by a non-constant: sum=sum>>i;
3.6ns	Multiply: sum=sum*(i+1);
24ns	Divide: sum=sum/(i+1);
about 100 ns	pow, log, exp, sqrt, sin, cos, malloc/free
about 200 ns	fprintf (to a file; printing to the screen is >10x slower)
thousands of ns	fopen/fclose, network connections,