If statements, Flags, and Control Flow in Assembly

CS 301 Lecture, Dr. Lawlor

A jump instruction, like "jmp", just switches the CPU to executing a different piece of code.  It's the assembly equivalent of "goto", but unlike goto, jumps are not considered shameful in assembly.

You say where to jump to using a "jump label", which is just any string with a colon after it.  (The same exact syntax is used in C/C++)
Assembly jump
C++ goto
	mov eax,3
jmp lemme_outta_here
mov eax,999 ; <- not executed!
lemme_outta_here:
ret

(Try this in NetRun now!)

	int x=3;
goto quiddit;
x=999;
quiddit:
return x;

(Try this in NetRun now!)

 
In both cases, we return 3, because we jump right over the 999 assignment.  Jumping is somewhat useful for skipping over bad code, but it really gets useful when you add conditional jumps:
Instruction
Useful to...
Flags (see below)
jmp
Always jump
None
ja Unsigned >
CF=0 and ZF=0
jae
Unsigned >=
CF=0
jb
Unsigned <
CF=1
jbe
Unsigned <=
CF=1 or ZF=1
jc
Unsigned overflow,
or multiprecision add
CF=1
jecxz
Compare ecx with 0
ecx=0
je or jz
Equality
ZF=1
jg
Signed >
ZF=0 and SF=OF
jge
Signed >=
SF=OF
jl
Signed <
SF!=OF
jle
Signed <=
ZF=1 or SF!=OF
jne or jnz
Inequality
ZF=0
jo
Signed overflow
OF=1
jp or jpe
Parity check (even)
PF=1
jpo
Parity check (odd)
PF=0
js
Jump if negative
SF=1
There are also "n" NOT versions for each jump; for example "jno" jumps if there is NOT overflow.

Conditional Jumps: Branching in Assembly

In assembly, all branching is done with two types of instruction:
Here's how to use compare and jump-if-equal ("je"):
	mov eax,3
cmp eax,3 ; how does eax compare with 3?
je lemme_outta_here ; if it's equal, then jump
mov eax,999 ; <- not executed *if* we jump over it
lemme_outta_here:
ret

(Try this in NetRun now!)

Here's compare and jump-if-less-than ("jl"):
	mov eax,1
cmp eax,3 ; how does eax compare with 3?
jl lemme_outta_here ; if it's less, then jump
mov eax,999 ; <- not executed *if* we jump over it
lemme_outta_here:
ret

(Try this in NetRun now!)

The C++ equivalent to compare-and-jump-if-whatever is "if (something) goto somewhere;".

Also, check out the machine code generated for the conditional jump--the jump destination is encoded as the number of bytes of machine code to skip over.  For example, the "jl" above gets encoded in machine code like this:
   0:	b8 01 00 00 00       	mov    eax,0x1
5: 83 f8 03 cmp eax,0x3
8: 7c 05 jl f <foo+0xf>
a: b8 e7 03 00 00 mov eax,0x3e7
f: c3 ret
The distance to jump, shown in red above, is five bytes, because the code we're skipping over is five bytes long.  Note that a jump label doesn't show up in machine code at all--it's just used by the assembler to figure out how far to jump.

More Complex Control Flow: C--

You can actually write a very peculiar variant of C++, where "if" statements only contain "goto" statements.  My joke name for this assembly-style C++ where you only use "+=" and "*=" arithmetic, and "if (simple test) goto somewhere;" flow control is "C--"!

For example this is perfectly legal C++ in the "C--" style:
int main() {
int i=0;
if (i>=10) goto byebye;
std::cout<<"Not too big!\n";
byebye: return 0;
}
This way of writing C++ is quite similar to assembly--in fact, there's a one-to-one correspondence between lines of C code written this way and machine language instructions.  More complicated C++, like the "for" construct, expands out to many lines of assembly.
	int i, n=10;
for (i=0;i<n;i++) {
std::cout<<"In loop: i=="<<i<<"\n";
}
Here's one expanded version of this C/C++ "for" loop:
	int i=0, n=10;
start: std::cout<<"In loop: i=="<<i<<"\n";
i++;
if (i<n) goto start;
(executable NetRun link)

You've got to convince yourself that this is really equivalent to the "for" loop in all cases.  Careful--if n is a parameter, it's not!   (What if n>=i?)

All C flow-control constructs can be written using just "if" and "goto", which usually map one-to-one to a compare-and-jump sequence in assembly.
Normal C
Expanded C
if (A) {
  ...
}
if (!A) goto END;
{
  ...
}
END:
if (!A) {
  ...
}
if (A) goto END;
{
  ...
}
END:
if (A&&B) {
  ...
}
if (!A) goto END;
if (!B) goto END;
{
  ...
}
END:
if (A||B) {
  ...
}
if (A) goto STUFF;
if (B) goto STUFF;
goto END;
STUFF:
{
  ...
}
END:
while (A)  {
  ...
}
goto TEST;
START:
{
  ...
}
TEST: if (A) goto START;
do {
  ...
} while (A)
START:
{
  ...
}
if (A) goto START;
for (i=0;i<n;i++)
{
  ...
}
i=0;         /* Version A */
goto TEST;
START:
{
  ...
}
i++;
TEST: if (i<n) goto START;
for (i=0;i<n;i++)
{
  ...
}

i=0;          /* Version B */
START: if (i>=n) goto END;
{
  ...
}
i++;
goto START;
END:

Note that the last two translations of the "for" concept (labelled Version A and Version B) both compute the same thing.  Which one is faster? If the loop iterates many times, I claim version (A) is faster, since there's only one (conditional) goto each time around the loop, instead of two gotos in version (B)--one conditional and one unconditional.  But version (B) is probably faster if n is often 0, because in that case it quickly jumps to END (in one conditional jump).

Comparison Instruction

OK, so you want to know how some number A relates to some other number B.  So you subtract them.

If A-B = 0, then A=B.
If A-B > 0, then A > B.
If A-B < 0, then A < B.

Yup, so "cmp eax,10" actually internally subtracts 10 from the value in eax.  If the difference is zero, the CPU sets flag ZF (the Zero Flag).  If the difference is positive or negative, the CPU sets some other hideous flags to indicate this (the CPU sets various flags for both the signed and unsigned comparisons).

Turns out, "sub eax,10" actually sets all the same flags.  So you can compare two numbers with "cmp A,B" or "sub A,B", and you'll get the same result (but they're not totally interchangeable: "cmp" won't change A!). 

So then, you want to jump if the previous comparison came out equal.  You use the "je" instruction (Jump if Equal). 
Or you want to jump if the previous subtraction came out zero.  You use the "jz" instruction (Jump if Zero).

Turns out, "je" and "jz" are the same machine language instruction, because they both do entirely the same thing.

The bottom line is to do comparisons in assembly, you first do either a cmp or sub instruction, and then:
English
Less Than
Less or Equal
Equal
Greater or Equal
Greater Than
Not Equal
C/C++
<
<=
==
>=
>
!=
Assembly
  (signed)
jl
jle
je or jz
jge
jg
jne or jnz
Assembly
  (unsigned)
jb
jbe
je or jz
jae
ja
jne or jnz
The "b" in the unsigned comparison instructions stands for "below", and the "a" for "above". 

In C/C++, the compiler can tell whether you want a signed and unsigned comparison based on the variable's types.  There aren't any types in assembly, so it's up to you to pick the right instruction!

Compare vs. Subtract: Examples

Subtract sets all the same comparison flags as "cmp".  So this code returns 1, because 5 < 7.
mov ecx,5

sub ecx,7
jl yes_it_jumped
; ... else no, it didn't jump: return 0
mov eax,0
ret

yes_it_jumped: ; ... so return 1
mov eax,1
ret
(executable NetRun link)

Subtract also sets the zero flag, so here's a very small downward-counting loop:
mov edx,5
mov eax,0

loop_start:
add eax,7
sub edx,1
jnz loop_start ; Jumps if edx is still nonzero

ret
(executable NetRun link)

Flags and The Gory Details

The "cmp" instruction tells the subsequent conditional jump about the comparison via the "EFLAGS" register.

The "EFLAGS" register on x86 stores a bunch of flags, as shown on page 73 of the Intel arch manual Volume 1.  The important flags include:
You've also got to be aware of which instructions set which flags.  For example, the "cmp", "and" (bitwise AND), "sub", and "add" instructions set all the flags; "inc" (increment by 1) and "dec" (decrement by 1) set everything but CF; while "mov" and all the jump instructions don't mess with the flags.  It's easy to accidentally overwrite flags you care about, if you leave too much stuff between the time the flag is set and the time it's read!

You can actually look at the flags with the "lahf" instruction, which copies the important bits of EFLAGS into register ah--that is, bits 8-16 of eax get EFLAGS(SF:ZF:0:AF:0:PF:1:CF).  Here's some code that prints out a weird hex constant depending on the flags, although it performs comparisons repeatedly:
mov ecx,4 ; values to compare
mov edx,7

mov eax,0

cmp ecx,edx
jne skipe ; Equal to zero flag
add eax,0xE
skipe:

cmp ecx,edx
jno skipo ; Overflow flag (assuming signed)
add eax,0x80
skipo:

cmp ecx,edx
jns skips ; Sign flag (negative result)
add eax,0x500
skips:

cmp ecx,edx
jnc skipc ; Carry flag (33rd bit of result)
add eax,0xC000
skipc:

ret

(Try this in NetRun now!)


The various funky jump instructions, like "jc" (jump if CF is set), or "jo" (jump if OF is set), also read the flags.  Note there's NO way to get at the flags, or to directly call the flag-using instructions in C++!  None!  C/C++ compilers ignore integer overflow, and there's no way to fix this in C/C++, but in assembly it's as easy as a "jo"!

Bonus Horror!  The Gory Details of Comparisons

It takes quite a bit of staring to convince yourself that the various jumps are actually doing the right comparison. 

The easy case is for unsigned numbers, where "ja" and "jb" look at the carry flag, CF:
v minus -> 4000000000 3000000000 2000000000 1000000000 0
4000000000 0
ZF
1000000000
2000000000
3000000000
4000000000
3000000000 3294967296
(32-bit version of -1000000000)
CF
0
ZF
1000000000
2000000000
3000000000
2000000000 2294967296
(32-bit version of -2000000000)
CF
3294967296
(32-bit version of -1000000000)
CF
0
ZF
1000000000
2000000000
1000000000 1294967296
(32-bit version of -3000000000)
CF
2294967296
(32-bit version of -2000000000)
CF
3294967296
(32-bit version of -1000000000)
CF
0
ZF
1000000000
0 294967296
(32-bit version of -4000000000)
CF
1294967296
(32-bit version of -3000000000)
CF
2294967296
(32-bit version of -2000000000)
CF
3294967296
(32-bit version of -1000000000)
CF
0
ZF
Unsigned subtraction table. Note that CF is set when the answer should be negative; this indicates the first number is smaller (unsigned below), so "jb" checks CF==1. "jbe" checks CF==1 or ZF==1.  Here's a tiny version of the above table:
v minus -> 4bn 3bn 2bn 1bn 0
4bn ZF (=)
no flags (>)
no flags (>)
no flags (>) no flags (>)
3bn CF (<) ZF (=) no flags (>) no flags (>) no flags (>)
2bn CF (<) CF (<) ZF (=) no flags (>) no flags (>)
1bn CF (<) CF (<) CF (<) ZF (=) no flags (>)
0 CF (<) CF (<) CF (<) CF (<)
ZF (=)
Not too bad, right?  For unsigned, CF means less than.  Note that SF and OF are going crazy during this table, but for unsigned, we can ignore them.

Here's the same table for signed numbers:
v minus -> 2000000000 1000000000 0 -1000000000 -2000000000
2000000000 0
ZF
1000000000
2000000000
-1294967296
(32-bit version of 3000000000)
SF OF
-294967296
(32-bit version of 4000000000)
SF OF
1000000000 -1000000000
SF
0
ZF
1000000000
2000000000
-1294967296
(32-bit version of 3000000000)
SF OF
0 -2000000000
SF
-1000000000
SF
0
ZF
1000000000
2000000000
-1000000000 1294967296
(32-bit version of -3000000000)
OF
-2000000000
SF
-1000000000
SF
0
ZF
1000000000
-2000000000 294967296
(32-bit version of -4000000000)
OF
1294967296
(32-bit version of -3000000000)
OF
-2000000000
SF
-1000000000
SF
0
ZF
Signed subtraction table. If the first number is less than than the second number, you either get just SF (negative answer) or just OF (overflow). If it's greater, you get either no flags (positive answer, no overflow), or both SF and OF (negative answer due to overflow).  Here's a scaled-down version of the above table:
v minus -> +2bn +1bn 0 -1bn -2bn
+2bn ZF (=)
no flags (>)
no flags (>) SF OF (>)
SF OF (>)
+1bn SF (<)
ZF (=) no flags (>) no flags (>) SF OF (>)
0 SF (<) SF (<) ZF (=) no flags (>) no flags (>)
-1bn OF (<) SF (<) SF (<) ZF (=) no flags (>)
-2bn OF (<) OF (<) SF (<) SF (<) ZF (=)
Again, for signed numbers, *either* SF or OF alone means <.  No flags, or *both* SF and OF mean >.  "jl" and "jg" do these tests.

You can probably live your whole life, and certainly pass this class, without remembering exactly why "jl" tests for SF!=OF.  You do, however, have to know that SF and OF exist, because they really are useful on their own.