Data representation (Chapter 2.1)
- Memory, files as big arrays of bytes
- Integer representation as bits, bytes
- Binary, decimal, hex, octal, and base conversion
Operations
- Bitwise operations (Chapter 2.1)
- AND, OR, XOR
- "SIMD Within A Register" (SWAR): Cohen-Sutherland clipping
- Left & right shifting; finite integer range.
- Extract integer into bits, reassemble from bits.
- Arithmetic operations (Chapter 2.2 & 2.3)
- Addition: unsigned. Overflow. Wraparound. Range.
- Subtraction: two's complement addition; signed numbers
- Multiplication & acceleration via bit shifts
- Division & acceleration via bit shifts
- Modulus, implementation, acceleration via bit masks
- Relative speed of each operation on various machines
- Multiple-precision implementations of numerical operations
Instruction encoding (Chapter 3.1-4)
- Tiny example encoding: use of all above features in a tiny emulator
- Concept of registers: stash stuff here
- Register hardware implementation
- Register uses: program counter, address, data, etc.
- Concept of memory: big bunch o' bytes
- Opcodes: do this now
- Hardware implementation of above encoding (preview of EE 341)
- Real examples
- PPC (clean 4-byte register-based RISC)
- Java (clean 1-byte stack-based unboxed)
- CIL (1 or 2-byte stack-based boxed)
- x86 (hideous variable-length CISC)
Assembly & disassembly (Chapter 3.15)
- opcode mnemonics, naming a register, immediate values
- Inline (__asm) assembly syntaxes; standalone (.S) assembly syntaxes
- Operand order dyslexia
- Labels, macros, etc.
- Win32, gcc x86 inline assembly
- AT&T .S files
|
Memory
- Structures (Chapter 3.9)
- In-memory layout
- Alignment & padding
- sizeof, offsetof
- Array indexing (Chapter 3.8)
- 1D, 2D, 3D, nD
- For structs
- Global variables
Subroutines (Chapter 3.7)
- Stack allocation: push & pop
- Program Counter push & pop: call & return
- Parameter passing, pass by reference
- Calling conventions
- Subroutine linkage and naming
Heap memory
- Allocation & free (Chapter 10.9)
- Garbage collection (Chapter 10.10)
Performance and Optimization (Chapters 4, 5, and 9)
- General optimization checklist
- Timing and profiling
- Algorithmic Optimization
- Invariant hoisting, constant propagation
- Memory Performance
- Caching
- Levels & performance of cache
- Program transformations to improve memory performance
- Concurrency
- Hardware and Software Pipelining
- Cost of branches
Advanced control flow
- Function pointers, implementation
- C++ virtual method _vtable implementation
- Dynamic linking (Chapter 7)
Floating point (Chapter 2.4, 3.14, and beyond)
- Instructions
- IEEE floating-point representation
- Sign, exponent, mantissa
- normalization
- Fun bitwise hacks (fast absolute value, log-base-2, float-to-int, etc.)
- denormalized numbers, NaNs, and performance penalty
- Operations
- Interfaces
- PPC sensibility
- x86 stack horror
- 4-vector of floating point numbers
- x86 SSE & <mmintrin.h> intrinsics
- PPC AltiVec
- Graphics card ARB_fragment_program
|