Course Review for the Final Exam
CS 301: Assembly
Language Programming Lecture, Dr. Lawlor
Overall subject area
- Link to the
lecture notes, month/day of lecture
High-Performance Computing
- Performance
programming, 10/21
- Units: nanoseconds
- How to use a timer: start_time=t(); repeat
test k times; end_time=t();
time_per_test=(end_time-start_time)/k;
- Memory and cache performance, 10/24
- Walking the same memory in a different order can be
faster
- Threads and OpenMP: multicore parallel
programming, 10/26
- A thread runs a function alongside main, until you
join it
- Threads take a while to start up (5,000ns)
- Threads share memory, so they can accidentally
overwrite each other's work
- 100 threads: plausible. 1 million threads:
probably not.
- Graphics card
programming with CUDA, 10/28
- The graphics card has its own separate memory space
- CUDA prefers millions of threads, operating on
their own separate data
- The CPU still needs to do all memory allocation,
file I/O, network I/O, OS, etc
- High
performance with SIMD & SSE, 10/31
- SSE: 4-float operations (addps) take same time as
1-float operations (addss)
- SSE: align data in memory to 16 bytes to avoid
segfault
- SSE: if 4 floats need to do different things,
compute both and use if-then-else trick
- Bitwise
operations, 11/11
- if-then-else trick: result =
(mask & then) | (~mask & else);
- AND & for setting bits to zero with a mask
- OR | for sticking bit fields together
- NOT ~ to invert all the bits
- XOR ^ for selectively inverting bits
Systems Programming
- Memory
mapping and the page table, 11/02
- The world you live in is just a sugar-coated
topping: real memory is accessed via the page table.
- You can ask the OS to change your view of memory by
calling mmap.
- Operating
system calls, 11/04
- Syscall: load registers with request, crash to
operating system, OS will resume you when done
- Macros in C
and assembly, 11/14
- Conditional compilation
- Constant-like or function-like macros
- Pure text replacement: need workarounds to take parameters,
consume semicolons, etc.
- Systems
Programming: Function Registration, 11/16
- Registration lets existing old code call your new
code
- Systems Programming: Exploring the Linux
Kernelinteractive Google Doc, 11/21
- Plain C heavily uses macros and registered
functions
- Real systems are so complex you need to search
effectively: you can't just read the whole thing
- Systems
Programming: Context Switching User Level Threads, 11/18
- Each thread has its own stack
- Thread switching: save old registers, restore new
registers
Robotics
Weird Machine Architectures
- ARM Machine Code,
11/28
- RISC = every instruction takes the same number of
bytes (x86 is CISC, variable number of bytes per instruction)
- Link register stores the function return address
(x86 stores return address on the top of the stack)
- Many similarities to x86: scratch and preserved
registers, a stack, etc
- Segmented
16-bit x86 (worst ideas in programming), 11/30
- segment:offset addressing
- Can only access 64KB using a 16-bit offset
- Nanotech and biological computing,
12/02
- Biological systems encode information using
individual molecules
- Atoms are small and move fast, allowing the system
to use random motion to drive complex behavior
- DNA nucleotides encode for amino acids, assembled
into functional proteins and enzymes