void add_to_floats(float *dest,int n) {
	int i;
	for (i=0;i<n;i++) dest[i]+=1.0;
}
void add_to_ints(int *dest,int n) {
	int i;
	for (i=0;i<n;i++) dest[i]+=1;
}
#define n 1000
float some_floats[n];  int some_ints[n];
int add_floats(void) { add_to_floats(some_floats,n); return 0;}
int add_ints(void) { add_to_ints(some_ints,n); return 0;}
int foo(void) { 
	int i; double per_call;
	for (i=0;i<n;i++) some_floats[i]=0.0f;
	per_call=time_function(add_ints);
	printf("%.2f ns/int\n",1.0e9*per_call/n);
	per_call=time_function(add_floats);
	printf("%.2f ns/float\n",1.0e9*per_call/n);
	return 0;
}
Here's the performance of this code on various hardware (all on NetRun):
| Hardware | ns/int | ns/float | ns/clock | instructions per loop | clocks per instruction | Discussion | 
| Intel 486, 50MHz, 1991 | 181ns/int | 763ns/float | 20ns/clock | 4 (int) 8 (float) | 3 (int) 5 (float) | Classic non-pipelined CPU: many clocks/instruction. | 
| MIPS R5000, 180MHz, 1996 | 51ns/int | 114ns/float | 5.5ns/clock | 9 (int) 11 (float) | 1 (int) 2 (float) | Classic fully pipelined CPU: one
instruction/clock.  Note how there are more instructions than
CISC, but each instruction runs faster! | 
| PowerPC G4, 768MHz, 2001 | 8.3 ns/int | 10.2 ns/float | 1.3ns/clock | 8 (int) 9 (float) | 0.8 (int) 0.87 (float) | Superscalar RISC CPU: multiple
instructions per clock cycle.  The PowerPC wasn't amazingly good
at doing superscalar work yet, but it was superscalar. | 
| Intel Pentium III, 1133MHz, 2002 | 3.58 ns/int | 2.87 ns/float | 0.88ns/clock | 4 (int) 7 (float) | 1 (int) 0.5 (float) | Superscalar x86 CPU: the integer
unit is fully pipelined, so we get one instruction per clock
cycle.  But floating point runs *simultaneously* with the integer
stuff! | 
| Intel Pentium 4, 2.8Ghz, 2005 | 1.22 ns/int | 1.57 ns/float | 0.36ns/clock | 4 (int) 6 (float) | 0.84 (int) 0.73 (float) | Modern CPUs are able to run even
integer code superscalar.  Yet some code actually takes more clock
cycles per instruction on the Pentium 4, due to its deep pipeline. | 
| Intel Q6600, 2.4GHz, 2008 | 0.84 ns/int | 0.85 ns/float | 0.42ns/clock | 4 (int) 6 (float) | 0.5 (int) 0.33 (float) | We're even more superscalar, executing 2 or 3 instructions per clock cycle. |