# Course Review for the Final Exam

CS 301 Lecture, Dr. Lawlor

The final exam is this Friday, 3:15pm, in the usual classroom (Chapman 106).  It's nominally two hours long, but in reality it shouldn't take much longer than the midterm, an hour at most.

Things you should know for the final exam are all contained in the lecture notes and homeworks:
• Basics of assembly language:
• bits, bytes, and bitwise operations
• size of char, short, int, long, and pointers
• preserved and scratch registers for x86-64
• how the stack works, especially around a function call
• how flags work, and how "cmp ecx,4"  and "jle somewhere" actually work together
• existence of and purpose of machine code.
• How classes get laid out in memory:
• size in bytes of a class
• byte offsets from a pointer to the start of the class
• padding for alignment
• the existence and purpose of the vtable pointer.
• Floating-point numbers:
• SSE instructions, like "addss" and how to use them to solve float problems.
• Parallel SSE instructions, like "addps", and how to use them to solve bigger problems.
• Bits used to represent the sign (1 bit), exponent (8 bits), and mantissa (23 bits, plus implicit 1)
• Floating-point roundoff:
• small + big - big == 0 ?!
• big + small == big ?!
• Fixes for roundoff:
• use a bigger datatype (replace float with double)
• switch to integer (but watch out for overflow!)
• rearrange floating-point operations to do all small numbers first, then the bigger ones
• Performance impact of denormal and NaN floats.
• High performance computing:
• Algebraic rearrangement, like replacing x=x*x*x*x*x*x*x*x; with x=x*x; x=x*x; x=x*x;
• Strength reduction, like replacing integer x%2n with x&(2n-1) like in HW7.2; or floating point a/b with a*(1.0/b) like in HW7.5.
• Cache optimizations, like:
• moving your data structures closer together in memory (e.g., eliminate intermediate allocations, or malloc overhead, or just shrink the size of your classes)
• rearranging memory accesses for better locality (e.g., interchange x and y loops)
• avoiding big power-of-two jumps (be nicer to L1 cache's memory-to-cache mapping)
• GPU computing, like GLSL requires a complete rewrite but gives excellent performance for floating-point intensive code with few branches.
• Multicore computing, like OpenMP, works with most plain old C++ code and is better for branch, network, or disk intensive operations.