If statements, Flags, and Control Flow in Assembly

CS 301 Lecture, Dr. Lawlor

A jump instruction, like "jmp", just switches the CPU to executing a different piece of code. It's the assembly equivalent of "goto", but unlike goto, jumps are not considered shameful in assembly.

You say where to jump to using a "jump label", which is just any string with a colon after it. (The same exact syntax is used in C/C++)

Assembly jump

C++ goto

	mov eax,3
	jmp lemme_outta_here
	mov eax,999  ; <- not executed!
lemme_outta_here:
	ret

(Try this in NetRun now!)

	int x=3;
	goto quiddit;
	x=999;
quiddit:
	return x;

(Try this in NetRun now!)

In both cases, we return 3, because we jump right over the 999 assignment. Jumping is somewhat useful for skipping over bad code, but it really gets useful when you add conditional jumps:

Instruction	Useful to...	Flags (see below)
jmp	Always jump	None
ja	Unsigned >	CF=0 and ZF=0
jae	Unsigned >=	CF=0
jb	Unsigned <	CF=1
jbe	Unsigned <=	CF=1 or ZF=1
jc	Unsigned overflow, or multiprecision add	CF=1
jecxz	Compare ecx with 0	ecx=0
je or jz	Equality	ZF=1
jg	Signed >	ZF=0 and SF=OF
jge	Signed >=	SF=OF
jl	Signed <	SF!=OF
jle	Signed <=	ZF=1 or SF!=OF
jne or jnz	Inequality	ZF=0
jo	Signed overflow	OF=1
jp or jpe	Parity check (even)	PF=1
jpo	Parity check (odd)	PF=0
js	Jump if negative	SF=1

There are also "n" NOT versions for each jump; for example "jno" jumps if there is NOT overflow.

Conditional Jumps: Branching in Assembly

In assembly, all branching is done with two types of instruction:

A compare instruction, like "cmp", compares two values.
A conditional jump instruction, like "je" (jump-if-equal), does a goto somewhere if the two values satisfy the right condition.

Here's how to use compare and jump-if-equal ("je"):

	mov eax,3
	cmp eax,3 ; how does eax compare with 3?
	je lemme_outta_here  ; if it's equal, then jump
	mov eax,999  ; <- not executed *if* we jump over it
lemme_outta_here:
	ret

(Try this in NetRun now!)

Here's compare and jump-if-less-than ("jl"):

	mov eax,1
	cmp eax,3 ; how does eax compare with 3?
	jl lemme_outta_here  ; if it's less, then jump
	mov eax,999  ; <- not executed *if* we jump over it
lemme_outta_here:
	ret

(Try this in NetRun now!)

The C++ equivalent to compare-and-jump-if-whatever is "if (something) goto somewhere;".

Also, check out the machine code generated for the conditional jump--the jump destination is encoded as the number of bytes of machine code to skip over. For example, the "jl" above gets encoded in machine code like this:

   0:	b8 01 00 00 00       	mov    eax,0x1
   5:	83 f8 03             	cmp    eax,0x3
   8:	7c 05                	jl     f <foo+0xf>
   a:	b8 e7 03 00 00       	mov    eax,0x3e7
   f:	c3                   	ret

The distance to jump, shown in red above, is five bytes, because the code we're skipping over is five bytes long. Note that a jump label doesn't show up in machine code at all--it's just used by the assembler to figure out how far to jump.

More Complex Control Flow: C--

You can actually write a very peculiar variant of C++, where "if" statements only contain "goto" statements. My joke name for this assembly-style C++ where you only use "+=" and "*=" arithmetic, and "if (simple test) goto somewhere;" flow control is "C--"!

For example this is perfectly legal C++ in the "C--" style:

int main() {
	int i=0;
	if (i>=10) goto byebye;
	std::cout<<"Not too big!\n";
byebye:	return 0;
}

This way of writing C++ is quite similar to assembly--in fact, there's a one-to-one correspondence between lines of C code written this way and machine language instructions. More complicated C++, like the "for" construct, expands out to many lines of assembly.

	int i, n=10;
	for (i=0;i<n;i++) {
		std::cout<<"In loop: i=="<<i<<"\n";
	}

Here's one expanded version of this C/C++ "for" loop:

	int i=0, n=10;
start:	std::cout<<"In loop: i=="<<i<<"\n";
	i++;
	if (i<n) goto start;

(executable NetRun link)

You've got to convince yourself that this is really equivalent to the "for" loop in all cases. Careful--if n is a parameter, it's not! (What if n>=i?)

All C flow-control constructs can be written using just "if" and "goto", which usually map one-to-one to a compare-and-jump sequence in assembly.

Normal C	Expanded C
if (A) { ... }	if (!A) goto END; { ... } END:
if (!A) { ... }	if (A) goto END; { ... } END:
if (A&&B) { ... }	if (!A) goto END; if (!B) goto END; { ... } END:
if (A\|\|B) { ... }	if (A) goto STUFF; if (B) goto STUFF; goto END; STUFF: { ... } END:
while (A) { ... }	goto TEST; START: { ... } TEST: if (A) goto START;
do { ... } while (A)	START: { ... } if (A) goto START;
for (i=0;i<n;i++) { ... }	i=0; /* Version A */ goto TEST; START: { ... } i++; TEST: if (i<n) goto START;
for (i=0;i<n;i++) { ... }	i=0; /* Version B */ START: if (i>=n) goto END; { ... } i++; goto START; END:

Note that the last two translations of the "for" concept (labelled Version A and Version B) both compute the same thing. Which one is faster? If the loop iterates many times, I claim version (A) is faster, since there's only one (conditional) goto each time around the loop, instead of two gotos in version (B)--one conditional and one unconditional. But version (B) is probably faster if n is often 0, because in that case it quickly jumps to END (in one conditional jump).

Comparison Instruction

OK, so you want to know how some number A relates to some other number B. So you subtract them.

If A-B = 0, then A=B.
If A-B > 0, then A > B.
If A-B < 0, then A < B.

Yup, so "cmp eax,10" actually internally subtracts 10 from the value in eax. If the difference is zero, the CPU sets flag ZF (the Zero Flag). If the difference is positive or negative, the CPU sets some other hideous flags to indicate this (the CPU sets various flags for both the signed and unsigned comparisons).

Turns out, "sub eax,10" actually sets all the same flags. So you can compare two numbers with "cmp A,B" or "sub A,B", and you'll get the same result (but they're not totally interchangeable: "cmp" won't change A!).

So then, you want to jump if the previous comparison came out equal. You use the "je" instruction (Jump if Equal).
Or you want to jump if the previous subtraction came out zero. You use the "jz" instruction (Jump if Zero).

Turns out, "je" and "jz" are the same machine language instruction, because they both do entirely the same thing.

The bottom line is to do comparisons in assembly, you first do either a cmp or sub instruction, and then:

English	Less Than	Less or Equal	Equal	Greater or Equal	Greater Than	Not Equal
C/C++	<	<=	==	>=	>	!=
Assembly (signed)	jl	jle	je or jz	jge	jg	jne or jnz
Assembly (unsigned)	jb	jbe	je or jz	jae	ja	jne or jnz

The "b" in the unsigned comparison instructions stands for "below", and the "a" for "above".

In C/C++, the compiler can tell whether you want a signed and unsigned comparison based on the variable's types. There aren't any types in assembly, so it's up to you to pick the right instruction!

Compare vs. Subtract: Examples

Subtract sets all the same comparison flags as "cmp". So this code returns 1, because 5 < 7.

mov ecx,5

sub ecx,7
jl yes_it_jumped
; ... else no, it didn't jump: return 0
mov eax,0
ret

yes_it_jumped: ; ... so return 1
mov eax,1
ret

(executable NetRun link)

Subtract also sets the zero flag, so here's a very small downward-counting loop:

mov edx,5
mov eax,0

loop_start:
  add eax,7
  sub edx,1
  jnz loop_start ; Jumps if edx is still nonzero

ret

(executable NetRun link)

Flags and The Gory Details

The "cmp" instruction tells the subsequent conditional jump about the comparison via the "EFLAGS" register.

The "EFLAGS" register on x86 stores a bunch of flags, as shown on page 73 of the Intel arch manual Volume 1. The important flags include:

ZF-- The "zero flag". Set whenever the previous arithmetic result was zero. Can be used by the "jz" (jump if last result was zero) or "jnz" instructions. "je" (jump if equal) and "jne" (jump if not equal) are just aliases of jz & jnz, because if the difference is zero, then the two values are equal. For example, this code checks if the input is equal to 4:
    extern read_input
    call read_input
    cmp eax,4
    je equal
    add eax, 20 ; If not equal, add
equal: ret; If equal, just return
CF--The "carry flag". Set to indicate the bit that carries out of an addition or subtraction. For signed numbers, this doesn't really indicate a problem, but for unsigned numbers, this indicates an overflow. Can be used by the "jc" (jump if carry flag is set) instruction. Set by all the arithmetic instructions. Can be added into another arithmetic operation with "adc" (add with carry). For example, you can preserve the bit overflowing out of a big add like this:
mov ecx, 0x8000ff00
add ecx, ecx
mov eax,0
adc eax,eax ; Adds eax, eax, and the carry flag together
"adc" is used in the compiler's implementation of the 64-bit "long long" datatype, and in general in "multiple precision arithmetic" software, like the GNU Multiple Precision Arithmetic Library. It'd also be used to implement overflow checking by a careful compiler. The carry and zero flags are also used by the unsigned comparison instructions: "jb" (jump if unsigned below), "jbe" (jump if unsigned below or equal), "ja" (jump if unsigned above), and "jae" (jump if unsigned above or equal) in the usual way.
SF-- The "sign flag", which indicates a negative signed result. This is just the highest bit of the result, and it ignores any overflow. Because overflow can screw up your answer, "jl" looks at both SF and OF to implement signed comparison.
OF-- The signed "overflow flag". Set by subtract, add, and compare in a surprisingly clever way: for example, if you add two positive numbers, and get a negative number, then overflow happened. If you add a positive and a negative number, overflow can't happen. If you add two negative numbers and get a positive number, then overflow happened. OF is used in the signed comparison instructions "jl" (jump if less than), "jle" (jump if less than or equal to), "jg" (jump if greater than), and "jge" (jump if greater than or equal to) instructions. See below for exactly how OF and SF are used. For example:

jae: unsigned >=. Jumps if CF==0. OK, we've just computed a-b, and want to jump if a>=b. If a-b is positive (or zero), then CF==0 and a>=b, so we should jump, and do. If a-b is negative, we'd get a carry, CF==1, and we don't jump.
jge: signed >=. Jumps if SF==OF. Normally, we didn't overflow, so OF is zero, and this is exactly the same as the jae case above. Recall that this is a signed compare, so we may get a carry if we're comparing negative numbers, so it's not worth looking at CF. Curiously, if we overflowed, the sign bit is now wrong, so if OF is one, we compare SF against one, which flips the comparison back the right way again.

PF and AF are really bizarre ancient flags, holdovers from the 8 bit days. They both operate only on the low 8 bits of the result. PF returns odd parity, like for serial communication. AF indicates a carry from the low 4 bits up into the high 4 bits, which is used for the ancient technique of "binary coded decimal" (BCD), where "0x23" means decimal twenty-three, not thirty-five like normal hex. It's nasty, now gone, and good riddance!

You've also got to be aware of which instructions set which flags. For example, the "cmp", "and" (bitwise AND), "sub", and "add" instructions set all the flags; "inc" (increment by 1) and "dec" (decrement by 1) set everything but CF; while "mov" and all the jump instructions don't mess with the flags. It's easy to accidentally overwrite flags you care about, if you leave too much stuff between the time the flag is set and the time it's read!

You can actually look at the flags with the "lahf" instruction, which copies the important bits of EFLAGS into register ah--that is, bits 8-16 of eax get EFLAGS(SF:ZF:0:AF:0:PF:1:CF). Here's some code that prints out a weird hex constant depending on the flags, although it performs comparisons repeatedly:

mov ecx,4 ; values to compare
mov edx,7

mov eax,0

cmp ecx,edx
jne skipe ; Equal to zero flag
add eax,0xE
skipe:

cmp ecx,edx
jno skipo ; Overflow flag (assuming signed)
add eax,0x80
skipo:

cmp ecx,edx
jns skips ; Sign flag (negative result)
add eax,0x500
skips:

cmp ecx,edx
jnc skipc ; Carry flag (33rd bit of result)
add eax,0xC000
skipc:

ret

(Try this in NetRun now!)

The various funky jump instructions, like "jc" (jump if CF is set), or "jo" (jump if OF is set), also read the flags. Note there's NO way to get at the flags, or to directly call the flag-using instructions in C++! None! C/C++ compilers ignore integer overflow, and there's no way to fix this in C/C++, but in assembly it's as easy as a "jo"!

Bonus Horror! The Gory Details of Comparisons

It takes quite a bit of staring to convince yourself that the various jumps are actually doing the right comparison.

The easy case is for unsigned numbers, where "ja" and "jb" look at the carry flag, CF:

v minus ->	4000000000	3000000000	2000000000	1000000000	0
4000000000	0 ZF	1000000000	2000000000	3000000000	4000000000
3000000000	3294967296 (32-bit version of -1000000000) CF	0 ZF	1000000000	2000000000	3000000000
2000000000	2294967296 (32-bit version of -2000000000) CF	3294967296 (32-bit version of -1000000000) CF	0 ZF	1000000000	2000000000
1000000000	1294967296 (32-bit version of -3000000000) CF	2294967296 (32-bit version of -2000000000) CF	3294967296 (32-bit version of -1000000000) CF	0 ZF	1000000000
0	294967296 (32-bit version of -4000000000) CF	1294967296 (32-bit version of -3000000000) CF	2294967296 (32-bit version of -2000000000) CF	3294967296 (32-bit version of -1000000000) CF	0 ZF

Unsigned subtraction table. Note that CF is set when the answer should be negative; this indicates the first number is smaller (unsigned below), so "jb" checks CF==1. "jbe" checks CF==1 or ZF==1. Here's a tiny version of the above table:

v minus ->	4bn	3bn	2bn	1bn	0
4bn	ZF (=)	no flags (>)	no flags (>)	no flags (>)	no flags (>)
3bn	CF (<)	ZF (=)	no flags (>)	no flags (>)	no flags (>)
2bn	CF (<)	CF (<)	ZF (=)	no flags (>)	no flags (>)
1bn	CF (<)	CF (<)	CF (<)	ZF (=)	no flags (>)
0	CF (<)	CF (<)	CF (<)	CF (<)	ZF (=)

Not too bad, right? For unsigned, CF means less than. Note that SF and OF are going crazy during this table, but for unsigned, we can ignore them.

Here's the same table for signed numbers:

v minus ->	2000000000	1000000000	0	-1000000000	-2000000000
2000000000	0 ZF	1000000000	2000000000	-1294967296 (32-bit version of 3000000000) SF OF	-294967296 (32-bit version of 4000000000) SF OF
1000000000	-1000000000 SF	0 ZF	1000000000	2000000000	-1294967296 (32-bit version of 3000000000) SF OF
0	-2000000000 SF	-1000000000 SF	0 ZF	1000000000	2000000000
-1000000000	1294967296 (32-bit version of -3000000000) OF	-2000000000 SF	-1000000000 SF	0 ZF	1000000000
-2000000000	294967296 (32-bit version of -4000000000) OF	1294967296 (32-bit version of -3000000000) OF	-2000000000 SF	-1000000000 SF	0 ZF

Signed subtraction table. If the first number is less than than the second number, you either get just SF (negative answer) or just OF (overflow). If it's greater, you get either no flags (positive answer, no overflow), or both SF and OF (negative answer due to overflow). Here's a scaled-down version of the above table:

v minus ->	+2bn	+1bn	0	-1bn	-2bn
+2bn	ZF (=)	no flags (>)	no flags (>)	SF OF (>)	SF OF (>)
+1bn	SF (<)	ZF (=)	no flags (>)	no flags (>)	SF OF (>)
0	SF (<)	SF (<)	ZF (=)	no flags (>)	no flags (>)
-1bn	OF (<)	SF (<)	SF (<)	ZF (=)	no flags (>)
-2bn	OF (<)	OF (<)	SF (<)	SF (<)	ZF (=)

Again, for signed numbers, *either* SF or OF alone means <. No flags, or *both* SF and OF mean >. "jl" and "jg" do these tests.

You can probably live your whole life, and certainly pass this class, without remembering exactly why "jl" tests for SF!=OF. You do, however, have to know that SF and OF exist, because they really are useful on their own.