Optimization: Branch & Switch Statements

CS 301 Lecture, Dr. Lawlor, 2005/10/31

We've seen what branch statements look like in assembly. The performance, though, is curious--
(Executable NetRun Link)

int rep=0; /* Number of repetitions for do_it to execute*/
int initsum=0; /* Value sum starts out with */

int do_it(void) {
	int i, max=rep, sum=initsum;
	for (i=0;i<max;i++) {
		if (sum<0)
			sum++; /* too small--make it bigger */
		else 
			sum--; /* too big-- make it smaller */
	}
	return (int)sum;
}

void print_it(int sum) {
	initsum=sum;
	rep=0;
	double A=time_function(do_it); /* time to do no iterations */
	
	int n=200;
	rep=n;
	double B=(time_function(do_it)-A)/n; /* time per iteration */
	
	printf("initsum=%9d: %.2f ns.  Per iteration: %.2f ns\n",
		initsum, A*1.0e9, B*1.0e9);
}

int foo(void) {
	print_it(-1000000);
	print_it(-20);
	print_it(+0);
	print_it(+20);
	print_it(+1000000);
	return 0;
}

The inner loop's assembly is:

  17:	test   %eax,%eax           /* sum is in eax */
  19:	jns    1e <do_it()+0x1e>   /* Jump to else if sum is not negative (jns==Jump if Not Signed) */
  1b:	inc    %eax                /*   sum++ */
  1c:	jmp    1f <do_it()+0x1f>   /* skip else branch */
  1e:	dec    %eax                /*   sum-- */
  1f:	inc    %edx                /* i++ */
  20:	cmp    %ecx,%edx           /* keep looping if i<max */
  22:	jl     17 <do_it()+0x17>

The performance looks like:

initsum= -1000000: 5.00 ns.  Per iteration: 1.52 ns
initsum=      -20: 5.05 ns.  Per iteration: 1.89 ns
initsum=        0: 5.00 ns.  Per iteration: 1.51 ns
initsum=       20: 4.99 ns.  Per iteration: 1.98 ns
initsum=  1000000: 4.97 ns.  Per iteration: 1.26 ns

There aren't any big suprises if the sum is really big or small--the branch always goes one way, so the performance is just the speed of that side of the loop: the "else" is slightly faster, because the "if" side has to skip over it. More suprising is the "0" case, which just alternates between sides of the loop--it's always the speed of the *slow* side of the loop.

Even more suprising is the "20" or "-20" cases: the first few times, the branch always goes the same way. This lulls the processor into a false sense of security--it's now almost sure the branch goes only that way. But once sum hits 0 the branch begins to alternate back and forth, so the processor mispredicts a bunch of branches, which costs a fair amount of performance.

Switch Statements

The "switch" statement in C can be implemented in one of two ways:

The compiler could put in a test for each "case", and jump to the appropriate chunk of code. This is fine for small "switch"es, but gets slower and slower as you add cases (because you need more and more tests).
The compiler can build a little "jump table" pointing to the code to jump to for each case, and then just do a computed jump, using something like "jmp *the_table(,%edx,4)", which means to jump to the pointer stored in "the_table[%edx]". The advantage of this approach is that it costs the same amount for 10 cases as it does for 1000 cases (because indexing a 10-entry table is about as fast as a 1000-entry table).

In each case, a "break" is always just a "goto end_of_switch".

The Thing that should not be C

Here's the unholy hybrid offspring of a switch statement and a for loop.
(Executable NetRun Link)

int foo(void) {
	int sum=0,j=0;
	switch (sum&0x7) {
	       for (;j<100;j++) { 
	case 7:  sum++;
	case 6:  sum++;
	case 5:  sum++; 
	case 4:  sum++;
	case 3:  sum++;
	case 2:  sum++;
	case 1:  sum++;
	case 0:  sum++; 
	       }
	};
	return (int)sum;
}

Note the return value--*both* the switch statement *and* the for loop are running!

Can you think of a use for this monstrosity?