Course Review for Final Exam
CS 301 Lecture, Dr. Lawlor
Here's a compressed summary of the topics we've covered in the entire
class so far. See the lecture notes for all the gory details.
There will be questions on the final covering everything, from bits to
function calls. But the exam's coverage will mostly be on the
performance and floating-point stuff we've done since the midterm.
- Data storage: bits, hex (4 bits), byte (8 bits), int (32 bits; 4
bytes), long long (64 bits; 8 bytes), float (32 bits; 4 bytes), double
(64 bits; 8 bytes), SSE __m128 (128 bits; 16 bytes).
- Bitwise operations: & | ^ ~
<< >>, and how to use them to simulate branch
- Machine code: representing assembly language instructions as
bytes. Running machine code on an emulated CPU, or the real CPU.
- Registers: trashable (eax, ecx, edx, xmm0...xmm7) versus
preserved (esp, ebp, esi, edi, ebx) registers. You can save
preserved registers by pushing them at the start of your function and
popping them before you return.
- Branching / jumps / flow control:
- All assembly flow control boils down to "if (something) goto somewhere;" which is written as "cmp eax,7; jne somewhere;"
- A single "if" isn't too hard to write. But you really
have to pay attention for complicated multi-part conditionals like "if
(A || (B && C))"!
- Loops are simple in principle: if we're not done with the loop, jump back to the start of the loop.
- Be careful of iteration 0. Sometimes the loop body shouldn't execute at all, like "for (int i=0;i<n;i++)" when n==0.
- Memory access:
- You can allocate memory from the stack ("sub esp,100"), via
malloc ("push 100; call malloc;"), or statically in your program
("section .data; dd 3").
- Given a pointer to the first element of an int array, element i
of the array is 4*i bytes higher in memory ("mov eax, DWORD[ecx +
- Given a pointer to a class, the pointer actually points to the
class's first member. Subsequent members are higher in memory,
but be careful, because the compiler may add padding for alignment!
- The stack:
- Above the stack pointer is claimed; below the stack pointer is unclaimed.
- Move the stack pointer down to claim space (sub); move it up to release space (add).
- "push" moves the stack pointer down and copies a new value into the claimed space.
- "pop" copies the stack value out, and moves the stack pointer back.
- You MUST restore the stack pointer to its original value before you return!
- Sometimes it's handy to save a backup copy of the original stack pointer in register ebp.
- Function calls:
- Declare an outside function to be callable with "extern hisFunc".
- Call a function with "call someFunc".
- Declare one of your own functions to be visible outside with "global myFunc". From C++, declare your function extern "C" void myFunc(int someParameter);
- Push a function's arguments onto the stack in right-to-left order (first argument is sitting on top on the stack).
- A function's return address gets pushed on the stack during a
"call" instruction, and popped off again during a "ret" instruction.
- So you grab your own arguments from the stack by skipping over
the return address (at DWORD[esp]) to get the first argument at
DWORD[esp+4], the second argument at DWORD[esp+8], and so on.
- Floating point in general:
- Floating point numbers actually have three separate parts: the
sign bit (positive or negative), the exponent (the power of two /
location of the decimal point), and the mantissa (fraction part).
- Each of these parts is a fixed size: for a single-precision float, 1 sign bit, 8 exponent bits, and 23 mantissa bits.
- The exponent field is stored "biased" by 127, so you represent 20 with an exponent field of 127, 21 with an exponent field of 128, 22 with an exponent field of 129, etc.
- The mantissa field is stored with an "implicit leading 1" in
front of the decimal point. This doesn't actually appear in the
bits of the float, but does contribute the float's value!
- The mantissa field is a fixed size. This fixed size isn't
enough to correctly store some values, including infinitely repeating
decimals like 1/3, or infinitely repeating binary numbers like 1/10, or
a variety of large values added to or subtracted from smaller
values. In these cases, you get "roundoff" error--the wrong
- Roundoff can often be reduced by changing your data type (using
a different number of mantissa bits), or changing the order you perform
- Due to roundoff, floating point values don't give the same
answer when added in different orders: (a+b) + c might be
slightly different from a + (b+c).
- Unlike integers, floats never wrap around--if you keep adding
to a float, it eventually stops changing, either at a finite value due
to roundoff, or at the special "infinity" value.
- Floats have several special values: "infinity", "nan" (an error
value), and "denormal" (very tiny) numbers. There is normally a
performance impact from processing these special values.
- 1970's stack-based floating point assembly:
- "fld" or "fild" instructions load a float or int from memory into the "floating point register stack".
- "fst" or "fist" instructions store the top of the
floating-point register stack to memory. "fstp" stores and then pops
- "faddp" adds the top two things on the floating point register
stack, and replaces them with their sum. Ditto for "fmulp", etc.
- New SSE floating point assembly:
- "movss xmm1, DWORD[ecx]" loads a single float from memory.
- "movss DWORD[ecx], xmm1" stores a single float to memory.
- "cvtsi2ss xmm1,eax" converts a single integer to a single single-precision float. "cvtss2si" goes the other way.
- "addss xmm1,xmm4" adds single floats, just like "subss", "mulss", "divss", "sqrtss", etc.
- "addps xmm1,xmm4"
adds packed floats--four floats at once. "subps", "mulps",
"divps", "sqrtps", etc do the same thing. Doing four things at
once can be substantially faster than doing one thing at a time!
- "movaps xmm1,[ecx]" loads four floats from aligned memory. The memory need
not be 16-byte aligned, but this function is substantially slower than
the aligned version.
- "movups xmm1,[ecx]"
loads four floats from unaligned memory. The memory need not be
16-byte aligned, but this function is substantially slower than the
- SSE can also be accessed from C++ via the "xmmintrin.h" functions: __m128 datatype, _mm_load_ps, _mm_add_ps, etc.
; Save registers
; Figure out how many values are coming
; eax == number of elements to read
mov esi,0 ; i
mov edi,eax ; n
; Allocate space for read-in values
imul eax,4 ; number of bytes to allocate
add esp,4 ; clean off stack
mov ebx,eax; ebx == pointer to malloc'd region
; Loop over input values,
; for (i=0;i<n;i++) arr[i]=read_input();
jmp loopcompare ; subtle: need this for n<=0 case...
mov DWORD[ebx + 4 * esi],eax
; Print out our array of input values
push edi; number of ints to print
push ebx ; pointer to ints to print
add esp,8 ; clean stack
; Clean up allocated memory
; Restore registers and return
(Try this in NetRun now!)
; Don't print if the number is <= 8
; Convert eax to float, multiply by pi, store to myRet
; Print out myRet
push 1; number of floats to print
add esp,8 ; clean stack
jmp foo ; keep re-running read_input and our little function
(Try this in NetRun now!)