Complex Control Flow: Real C++ Loops

CS 301 Lecture, Dr. Lawlor

You can actually write a very peculiar variant of C, where "if" statements only contain "goto" statements.  This is perfectly legal C/C++:
int main() {
int i=0;
if (i>=10) goto byebye;
std::cout<<"Not too big!\n";
byebye: return 0;
}
This way of writing C is quite similar to assembly--in fact, there's a one-to-one correspondence between lines of C code written this way and machine language instructions.  More complicated C, like the "for" construct, expands out to many lines of assembly.
	int i, n=10;
for (i=0;i<n;i++) {
std::cout<<"In loop: i=="<<i<<"\n";
}
Here's an expanded version of this C/C++ "for" loop:
	int i=0, n=10;
start: std::cout<<"In loop: i=="<<i<<"\n";
i++;
if (i<n) goto start;
(executable NetRun link)

You've got to convince yourself that this is really equivalent to the "for" loop in all cases.  Careful--if n is a parameter, it's not!   (What if n>=i?)

All C flow-control constructs can be written using just "if" and "goto", which usually map one-to-one to a compare-and-jump sequence in assembly.
Normal C
Expanded C
if (A) {
  ...
}
if (!A) goto END;
{
  ...
}
END:
if (!A) {
  ...
}
if (A) goto END;
{
  ...
}
END:
if (A&&B) {
  ...
}
if (!A) goto END;
if (!B) goto END;
{
  ...
}
END:
if (A||B) {
  ...
}
if (A) goto STUFF;
if (B) goto STUFF;
goto END;
STUFF:
{
  ...
}
END:
while (A)  {
  ...
}
goto TEST;
START:
{
  ...
}
TEST: if (A) goto START;
do {
  ...
} while (A)
START:
{
  ...
}
if (A) goto START;
for (i=0;i<n;i++)
{
  ...
}
i=0;         /* Version A */
goto TEST;
START:
{
  ...
}
i++;
TEST: if (i<n) goto START;
for (i=0;i<n;i++)
{
  ...
}

i=0;          /* Version B */
START: if (i>=n) goto END;
{
  ...
}
i++;
goto START;
END:

Note that the last two translations of the "for" concept (labelled Version A and Version B) both compute the same thing.  Which one is faster? If the loop iterates many times, I claim version (A) is faster, since there's only one (conditional) goto each time around the loop, instead of two gotos in version (B)--one conditional and one unconditional.  But version (B) is probably faster if n is often 0, because in that case it quickly jumps to END (in one conditional jump).

Flags

The "cmp" instruction tells the subsequent conditional jump about the comparison via the "EFLAGS" register.

The "EFLAGS" register on x86 stores a bunch of flags, as shown on page 73 of the Intel arch manual Volume 1.  The important flags include:
You've also got to be aware of which instructions set which flags.  For example, the "cmp", "and" (bitwise AND), "sub", and "add" instructions set all the flags; "inc" (increment by 1) and "dec" (decrement by 1) set everything but CF; while "mov" and all the jump instructions don't mess with the flags.  It's easy to accidentally overwrite flags you care about, if you leave too much stuff between the time the flag is set and the time it's read!

You can actually look at the flags with the "lahf" instruction, which copies the important bits of EFLAGS into register ah--that is, bits 8-16 of eax get EFLAGS(SF:ZF:0:AF:0:PF:1:CF).

The various funky jump instructions, like "jc" (jump if CF is set), or "jo" (jump if OF is set), also read the EFLAGS register.

Note there's NO way to get at the flags, or to directly call the flag-using instructions in C!  None!  C/C++ compilers ignore integer overflow, and there's no way to fix this in C/C++.

Comparison Instruction

OK, so you want to know how some number A relates to some other number B.  So you subtract them.

If A-B = 0, then A=B.
If A-B > 0, then A > B.
If A-B < 0, then A < B.

Yup, so "cmp eax,10" actually internally subtracts 10 from the value in eax.  If the difference is zero, the CPU sets flag ZF (the Zero Flag).  If the difference is positive or negative, the CPU sets some other hideous flags to indicate this (the CPU sets various flags for both the signed and unsigned comparisons).

Turns out, "sub eax,10" actually sets all the same flags.  So you can compare two numbers with "cmp A,B" or "sub A,B", and you'll get the same result (but they're not totally interchangeable: "cmp" won't change A!). 

So then, you want to jump if the previous comparison came out equal.  You use the "je" instruction (Jump if Equal). 
Or you want to jump if the previous subtraction came out zero.  You use the "jz" instruction (Jump if Zero).

Turns out, "je" and "jz" are the same machine language instruction, because they both do entirely the same thing.

The bottom line is to do comparisons in assembly, you first do either a cmp or sub instruction, and then:
English
Less Than
Less or Equal
Equal
Greater or Equal
Greater Than
Not Equal
C/C++
<
<=
==
>=
>
!=
Assembly
  (signed)
jl
jle
je or jz
jg
jge
jne or jnz
Assembly
  (unsigned)
jb
jbe
je or jz
ja
jae
jne or jnz
The "b" in the unsigned comparison instructions stands for "below", and the "a" for "above". 

In C/C++, the compiler can tell whether you want a signed and unsigned comparison based on the variable's types.  There aren't any types in assembly, so it's up to you to pick the right instruction!

Compare vs. Subtract: Examples

Subtract sets all the same comparison flags as "cmp".  So this code returns 1, because 5 < 7.
mov ecx,5

sub ecx,7
jl yes_it_jumped
; ... else no, it didn't jump: return 0
mov eax,0
ret

yes_it_jumped: ; ... so return 1
mov eax,1
ret
(executable NetRun link)

Subtract also sets the zero flag, so here's a very small downward-counting loop:
mov edx,5
mov eax,0

loop_start:
add eax,7
sub edx,1
jnz loop_start ; Jumps if edx is still nonzero

ret
(executable NetRun link)