Computing!

CS 441 Lecture, Dr. Lawlor
So here's some assembly language:

        Machine Code:           Assembly Code:
Address                    Instruction Operands
   0:	55                   	push   ebp
   1:	89 e5                	mov    ebp,esp
   3:	b8 07 00 00 00       	mov    eax,0x7
   8:	5d                   	pop    ebp
   9:	c3                   	ret

Here's the terminology we'll be using for the rest of the semester:

Address. A byte count, indicating where you are in a big list of bytes. The first byte has address zero. An address can be thought of as an array index into a big array of chars.
Machine Code. A set of bytes that the CPU can treat as instructions and execute. For example, the byte "0xC3" tells the CPU to execute the instruction "ret" (return from function). Human beings can write machine code (just wait for HW3!), but usually humans write Assembly Code instead.
Assembly Code. The human-readable counterpart to machine code. Assembly code is line-oriented human readable text that lots of people write by hand. You use an "assembler" to turn assembly code into executable machine code. You can use a "disassembler" to turn executable code into assembly code (try the NetRun "Disassemble" checkbox for the code above!).
Instruction. One command for the CPU to execute. Can be thought of as: one line of assembly code, or one set of machine code bytes. Depending on the context, sometimes "instruction" includes the operands ("the 'push ebp' instruction"), sometimes it doesn't ("the 'push' instruction, with operand 'ebp'.").
Operands. The parameters taken by various instructions. These parameters can be constants, like the "0x7" above, addresses in memory (which we'll talk about in a week or so), or registers.
Registers. Registers are little storage locations built into the CPU. They're used like variables in assembly language--you spend most of your time putting values into registers, doing arithmetic on register values, and moving values around between registers. For example, "ebp", "esp", and "eax" above are all registers. Unlike variables, there are a fixed number of registers (built into the design of the CPU), and the registers have fixed names (built into the assembler).

Here's a typical line of assembly code. It's one CPU instruction, with a comment:

	mov eax,1234 ;  I'm returning 1234, like the homework says...

(executable NetRun link)

There are several parts to this line:

"mov" is the "opcode", "instruction", or "mnemonic". It corresponds to the first byte (or so!) that tells the CPU what to do, in this case move a value from one place to another. The opcode tells the CPU what to do.
"eax" is the destination of the move, also known as the "destination operand". It's a register, and it happens to be 32 bits wide, so this is a 32-bit move.
1234 is the source of the moved data, also known as the "source operand". It's a constant, so you could use an expression (like "2+5*8") or a label (like "foo") instead.
The semicolon indicates the start of a comment. Semicolons are OPTIONAL in assembly!

Unlike C/C++, assembly is line-oriented, so the following WILL NOT WORK:

	mov eax,
	         1234 ;  I'm returning 1234, like the homework says...

Yup, line-oriented stuff is indeed annoying. Be careful that your editor doesn't mistakenly add newlines!

Instructions

A list of all possible x86 instructions can be found in:

Roger Jegerlehner's CodeTable, in categorized form.
Gary Burt's HTML table, just the basics, listed in alphabetical order, but maybe too terse.
Giant Intel PDF reference manual (section 3.2), impossibly complete; but nearly impossible to understand.

The really important opcodes are listed in my cheat sheet. Most programs can be writen with mov, the arithmetic instructions (add/sub/mul), the function call instructions (call/ret), the stack instructions (push/pop), and the conditional jumps (cmp/jmp/jl/je/jg/...). We'll learn about these over the next few weeks!

Registers

Here are the commonly-used x86 registers:

eax. This is the register that stores a function's return value.
eax, ecx, edx, esi, edi. "Scratch" registers you can always overwrite with any value.
rdi, rsi, rdx, rcx, ... In 64-bit mode, these registers contain function arguments, in left-to-right order.
esp, ebp. Registers used to run the stack. Be careful with these!

There are some other older or newer and much more rarely-used x86 registers:

Size	Register names	Meaning (note: not the official meanings!)	Introduced in
8-bit	al,ah, bl,bh, cl,ch, dl,dh	"Low" and "High" parts of bigger registers	1972, Intel 8008
16-bit	ax, bx, cx, dx, si, di, sp, bp	"eXtended" versions of the original 8-bit registers	1978, Intel 8086/8088
32-bit	eax, ebx, ecx, edx, esi, edi, esp, ebp	"Extended eXtended" registers	1985, Intel 80386
64-bit	rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp, r8, r9, r10, r11, r12, r13, r14, r15	"Really eXtended" registers	2003, AMD Opteron / Athlon64 2004, Intel EM64T CPUs

x86 is rather unique in that all the smaller registers from bygone eras are still right there as *part* of the new, longer registers. So for example, this code returns 0x0000AB00, because 0xAB is put into the next-to-lowest byte of eax:

	mov eax,0 ; Clear eax
	mov ah,0xAB ;  Move "0xAB" into the next-to-the-last byte of eax

(executable NetRun link)

x86 Floating Point

On many CPUs, floating-point values are usually stored in special "floating-point registers", and are added, subtracted, etc with special "floating-point instructions", but other than the name these registers and instructions are exactly analogous to regular integer registers and instructions. For example, the integer PowerPC assembly code to add registers 1 and 2 into register 3 is "add r3,r1,r2"; the floating-point code to add floating-point registers 1 and 2 into floating-point register 3 is "fadd fr3,fr1,fr2".

x86 is not like that.

The problem is that the x86 instruction set wasn't designed with floating-point in mind; they added floating-point instructions to the CPU later (with the 8087, a separate chip that handled all floating-point instructions). Unfortunately, there weren't many unused opcode bytes left, and (being the 1980's, when bytes were expensive) the designers really didn't want to make the instructions longer. So instead of the usual instructions like "add register A to register B", x86 floating-point has just "add", which saves the bits that would be needed to specify the source and destination registers!

But the question is, what the heck are you adding? The answer is the "top two values on the floating-point register stack". That's not "the stack" (the memory area used by function calls), it's a separate set of values totally internal to the CPU's floating-point hardware. There are various load functions that push values onto the floating-point register stack, and most of the arithmetic functions read from the top of the floating-point register stack. So to compute stuff, you load the values you want to manipulate onto the floating-point register stack, and then use some arithmetic instructions.

For example, to add together the three values a, b, and c, you'd "load a; load b; add; load c; add;". Or, you could "load a; load b; load c; add; add;". If you've ever used an HP calculator, or written Postscript or Forth code, you've seen this "Reverse Polish Notation".

x86 Floating-Point in Practice

Here's what this looks like. The whole bottom chunk of code just prints the float on the top of the x86 register stack, with the assembly equivalent of the C code: printf("Yo! Here's our float: %f\n",f);

fldpi ; Push "pi" onto floating-point stack

sub esp,8 ; Make room on the stack for an 8-byte double
fstp QWORD [esp]; Push printf's double parameter onto the stack
push my_string ; Push printf's string parameter (below)
extern printf
call printf  ; Print string
add esp,12    ; Clean up stack

ret ; Done with function

my_string: db "Yo!  Here's our float: %f",0xa,0

(Try this in NetRun now!)

There are lots of useful floating-point instructions:

Assembly	Description
fld1	Pushes into the floating-point registers the constant 1.0
fldz	Pushes into the floating-point registers the constant 0.0
fldpi	Pushes the constant pi. (Try this in NetRun now!)
fld DWORD [eax]	Pushes into the floating-point registers the 4-byte "float" loaded from memory at address eax. This is how most constants get loaded into the program. (Try this in NetRun now!)
fild DWORD [eax]	Pushes into the floating-point registers the 4-byte "int" loaded from memory at address eax.
fld QWORD [eax]	Pushes an 8-byte "double" loaded from address eax. (Try this in NetRun now!)
fld st0	Duplicates the top float, so there are now two copes of it. (Try this in NetRun now!)
fstp DWORD [eax]	Pops the top floating-point value, and stores it as a "float" to address eax.
fst DWORD [eax]	Reads the top floating-point value and stores it as a "float" to address eax. This doesn't change the value stored on the floating-point stack.
fstp QWORD [eax]	Pops the top floating-point value, and stores it as a "double" to address eax.
faddp	Add the top two values, pushes the result. (Try this in NetRun now!)
fsubp	Subtract the two values, pushes the result. Note "fld A; fld B; fsubp;" computes A-B. (Try this in NetRun now!) There's also a "fsubrp" that subtracts in the opposite order (computing B-A).
fmulp	Multiply the top two values.
fdivp	Divide the top two values. Note "fld A; fld B; fdivp;" computes A/B. (Try this in NetRun now!) There's also a "fdivrp" that divides in the opposite order (computing B/A).
fabs	Take the absolute value of the top floating-point value.
fsqrt	Take the square root of the top floating-point value.
fsin	Take the sin() of the top floating-point value, treated as radians. (Try this in NetRun now!)

Remember, "stack" here means the floating-point register stack, not the memory area used for passing parameters and such.

In general, the "p" instructions pop a value from the floating-point stack.

The non-"p" instructions don't. For example, there isn't a "fsinp" instruction, since sin only takes one argument, so the stack stays the same height after doing a sin().

x86 has quite a few really bizarre-sounding floating-point instructions. Intel's Reference Volume 2 has the complete list (Section 3, alphabetized under "f"). The "+1" and "-1" versions are designed to decrease roundoff, by shifting the input to the most sensitive region.

F2XM1	2^x - 1
FYL2X	y*log₂(x), where x is on top of the floating-point stack.
FYL2XP1	y*log₂(x+1), where x is on top
FCHS	-x
FSINCOS	Computes both sin(x) and cos(x). cos(x) ends up on top.
FPATAN	atan2(a/b), where b is on top
FPREM	fmod(a,b), where b is on top
FRNDINT	Round to the nearest integer
FXCH	Swap the top two values on the floating-point stack

PowerPC

All of the above is for ordinary x86 machines (Intel Pentiums, etc.) What about for PowerPC machines, like old Macs, the Xbox360 or the PlayStation 3? Well, the assembly code is very different in the gory details, but in the abstract it is absolutely identical:

        Machine Code:   Assembly Code:
Address             Instruction Operands
   0:	38 60 00 07 	li	r3,7
   4:	4e 80 00 20 	blr

Like x86, PowerPC machine code consists of bytes, with addresses, that represent assembly instructions and operands. PowerPC machine code also spends most of its time manipulating values in registers.

Unlike x86, there are 32 PowerPC registers, but the registers have uninteresting names (they're called r0 through r31). The names of the instructions are different; "li" in PowerPC (Load Immediate) is about like a "mov" in x86; "blr" (Branch to Link Register) serves the same purpose as "ret" in x86.

PowerPC machine code always uses four bytes for every instruction (it's RISC), while x86 uses from one to a dozen bytes per instruction (it's CISC). Here's a good but long retrospective article on the RISC-vs-CISC war, which got pretty intense during the 1990's. Nowadays, RISC machines compress their instructions (like CISC), while CISC machines decode their instructions into fixed-size blocks (like RISC), so the war ended in the best possible way--both sides have basically joined forces!

Deadbeef Survey

It's informative to look at the disassembly of this code on several CPUs:

return 0xdeadbeef;

(Try this in NetRun now!)

On x86, there's a one-byte load prefix, followed by the 4-byte little-endian constant:

   0:	b8 ef be ad de       	mov    eax,0xdeadbeef
   5:	c3                   	ret

On PowerPC, because instructions are just 32 bits, you've got to split the 4-byte constant across two instructions, "load immediate shifted" the high 16 bits, then "or immediate" pastes in the low 16 bits. PowerPC is big-endian.

   0:	3c 60 de ad 	lis	r3,-8531
   4:	60 63 be ef 	ori	r3,r3,48879
   8:	4e 80 00 20 	blr

On MIPS, you also have the low half/high half split. MIPS has a "branch delay slot" after every branch, which always gets executed even if the branch is taken:

[   0] 0x   c:  3c 02 de ad           lui	r2,0xdead	
[   0] 0x  10:  03 e0 00 08           jr	r31
[   0] 0x  14:  34 42 be ef           ori	r2,r2,0xbeef

On SPARC, you also have a branch delay, and the constant is split up across instructions. But it's split oddly--first you get the high 22 bits with "sethi", and then the low 10 bits:

   0:	11 37 ab 6f 	sethi  %hi(0xdeadbc00), %o0
   4:	81 c3 e0 08 	retl 
   8:	90 12 22 ef 	or  %o0, 0x2ef, %o0	! deadbeef <foo+0xdeadbeef>

On DEC Alpha, the only big surprise is that the machine code is little-endian. "lda" actually adds, not OR's, the sign-extended "0xffffbeef" constant to "0xdeae0000", so the sign-extension combines with the high bits to give "0xdeadbeef" in register v0 on return.

   0:	ae de 1f 24 	ldah	v0,-8530
   4:	ef be 00 20 	lda	v0,-16657(v0)
   8:	01 80 fa 6b 	ret

Overall, you can see that all these RISC machines use four bytes per instruction.