Complex Control Flow: Real C++ Loops

CS 301 Lecture, Dr. Lawlor

You can actually write a very peculiar variant of C, where "if" statements only contain "goto" statements. This is perfectly legal C/C++:

int main() {
	int i=0;
	if (i>=10) goto byebye;
	std::cout<<"Not too big!\n";
byebye:	return 0;
}

This way of writing C is quite similar to assembly--in fact, there's a one-to-one correspondence between lines of C code written this way and machine language instructions. More complicated C, like the "for" construct, expands out to many lines of assembly.

	int i, n=10;
	for (i=0;i<n;i++) {
		std::cout<<"In loop: i=="<<i<<"\n";
	}

Here's an expanded version of this C/C++ "for" loop:

	int i=0, n=10;
start:	std::cout<<"In loop: i=="<<i<<"\n";
	i++;
	if (i<n) goto start;

(executable NetRun link)

You've got to convince yourself that this is really equivalent to the "for" loop in all cases. Careful--if n is a parameter, it's not! (What if n>=i?)

All C flow-control constructs can be written using just "if" and "goto", which usually map one-to-one to a compare-and-jump sequence in assembly.

Normal C	Expanded C
if (A) { ... }	if (!A) goto END; { ... } END:
if (!A) { ... }	if (A) goto END; { ... } END:
if (A&&B) { ... }	if (!A) goto END; if (!B) goto END; { ... } END:
if (A\|\|B) { ... }	if (A) goto STUFF; if (B) goto STUFF; goto END; STUFF: { ... } END:
while (A) { ... }	goto TEST; START: { ... } TEST: if (A) goto START;
do { ... } while (A)	START: { ... } if (A) goto START;
for (i=0;i<n;i++) { ... }	i=0; /* Version A */ goto TEST; START: { ... } i++; TEST: if (i<n) goto START;
for (i=0;i<n;i++) { ... }	i=0; /* Version B */ START: if (i>=n) goto END; { ... } i++; goto START; END:

Note that the last two translations of the "for" concept (labelled Version A and Version B) both compute the same thing. Which one is faster? If the loop iterates many times, I claim version (A) is faster, since there's only one (conditional) goto each time around the loop, instead of two gotos in version (B)--one conditional and one unconditional. But version (B) is probably faster if n is often 0, because in that case it quickly jumps to END (in one conditional jump).

Flags

The "cmp" instruction tells the subsequent conditional jump about the comparison via the "EFLAGS" register.

The "EFLAGS" register on x86 stores a bunch of flags, as shown on page 73 of the Intel arch manual Volume 1. The important flags include:

ZF-- The "zero flag". Set whenever the previous arithmetic result was zero. Can be used by the "jz" (jump if last result was zero) or "jnz" instructions. "je" (jump if equal) and "jne" (jump if not equal) are just aliases of jz & jnz, because if the difference is zero, then the two values are equal. For example, this code checks if the input is equal to 4:
    extern read_input
    call read_input
    cmp eax,4
    je equal
    add eax, 20 ; If not equal, add
equal: ret; If equal, just return
CF--The "carry flag". Set to indicate the bit that carries out of an addition or subtraction. For signed numbers, this doesn't really indicate a problem, but for unsigned numbers, this indicates an overflow. Can be used by the "jc" (jump if carry flag is set) instruction. Set by all the arithmetic instructions. Can be added into another arithmetic operation with "adc" (add with carry). For example, you can preserve the bit overflowing out of a big add like this:
mov ecx, 0x8000ff00
add ecx, ecx
mov eax,0
adc eax,eax ; Adds eax, eax, and the carry flag together
"adc" is used in the compiler's implementation of the 64-bit "long long" datatype, and in general in "multiple precision arithmetic" software, like the GNU Multiple Precision Arithmetic Library. It'd also be used to implement overflow checking by a careful compiler. The carry and zero flags are also used by the unsigned comparison instructions: "jb" (jump if unsigned below), "jbe" (jump if unsigned below or equal), "ja" (jump if unsigned above), and "jae" (jump if unsigned above or equal) in the usual way.
SF-- The "sign flag", which indicates a negative signed result. Used together with OF to implement signed comparison.
OF-- The "overflow flag". Set by subtract, add, and compare, and used in the signed comparison instructions "jl" (jump if less than), "jle" (jump if less than or equal to), "jg" (jump if greater than), and "jge" (jump if greater than or equal to) instructions. If you stare at it hard enough, you can read the definitions, work out exactly what SF and OF do, and convince yourself they do the right thing. They do.

You've also got to be aware of which instructions set which flags. For example, the "cmp", "and" (bitwise AND), "sub", and "add" instructions set all the flags; "inc" (increment by 1) and "dec" (decrement by 1) set everything but CF; while "mov" and all the jump instructions don't mess with the flags. It's easy to accidentally overwrite flags you care about, if you leave too much stuff between the time the flag is set and the time it's read!

You can actually look at the flags with the "lahf" instruction, which copies the important bits of EFLAGS into register ah--that is, bits 8-16 of eax get EFLAGS(SF:ZF:0:AF:0:PF:1:CF).

The various funky jump instructions, like "jc" (jump if CF is set), or "jo" (jump if OF is set), also read the EFLAGS register.

Note there's NO way to get at the flags, or to directly call the flag-using instructions in C! None! C/C++ compilers ignore integer overflow, and there's no way to fix this in C/C++.

Comparison Instruction

OK, so you want to know how some number A relates to some other number B. So you subtract them.

If A-B = 0, then A=B.
If A-B > 0, then A > B.
If A-B < 0, then A < B.

Yup, so "cmp eax,10" actually internally subtracts 10 from the value in eax. If the difference is zero, the CPU sets flag ZF (the Zero Flag). If the difference is positive or negative, the CPU sets some other hideous flags to indicate this (the CPU sets various flags for both the signed and unsigned comparisons).

Turns out, "sub eax,10" actually sets all the same flags. So you can compare two numbers with "cmp A,B" or "sub A,B", and you'll get the same result (but they're not totally interchangeable: "cmp" won't change A!).

So then, you want to jump if the previous comparison came out equal. You use the "je" instruction (Jump if Equal).
Or you want to jump if the previous subtraction came out zero. You use the "jz" instruction (Jump if Zero).

Turns out, "je" and "jz" are the same machine language instruction, because they both do entirely the same thing.

The bottom line is to do comparisons in assembly, you first do either a cmp or sub instruction, and then:

English	Less Than	Less or Equal	Equal	Greater or Equal	Greater Than	Not Equal
C/C++	<	<=	==	>=	>	!=
Assembly (signed)	jl	jle	je or jz	jg	jge	jne or jnz
Assembly (unsigned)	jb	jbe	je or jz	ja	jae	jne or jnz

The "b" in the unsigned comparison instructions stands for "below", and the "a" for "above".

In C/C++, the compiler can tell whether you want a signed and unsigned comparison based on the variable's types. There aren't any types in assembly, so it's up to you to pick the right instruction!

Compare vs. Subtract: Examples

Subtract sets all the same comparison flags as "cmp". So this code returns 1, because 5 < 7.

mov ecx,5

sub ecx,7
jl yes_it_jumped
; ... else no, it didn't jump: return 0
mov eax,0
ret

yes_it_jumped: ; ... so return 1
mov eax,1
ret

(executable NetRun link)

Subtract also sets the zero flag, so here's a very small downward-counting loop:

mov edx,5
mov eax,0

loop_start:
  add eax,7
  sub edx,1
  jnz loop_start ; Jumps if edx is still nonzero

ret

(executable NetRun link)