CS 641 - Advanced Computer Architecture

Meeting time: TR 2-3:30pm
Room 104 Chapman Building
University of Alaska Fairbanks

3.0 Credits, Spring 2012
Prerequisite: CS 441 (arch)

Instructor: Dr. O. Lawlor
lawlor@alaska.edu, 474-7678
Office: 201E Chapman
Hours: 4-5 TR, by appointment, or just drop by!

Optional Textbook:
Computer Organization and Design: The Hardware/Software Interface, David Patterson and John Hennessy, Morgan Kaufmann, 3rd or 4th Edition.

ADA Compliance: Will work with Office of Disabilities Services (208 Whitaker Building, 474-5655) to provide reasonable accommodation to students with disabilities.

Course Website: http://www.cs.uaf.edu/2012/spring/cs641

Course Goals and Requirements

By the end of the course, you will be able to understand both the present and future of computer design for performance: parallelism. Specifically, we will cover performance analysis and modeling, and learn how to write and tune code on machines ranging from single-core embedded machines through cutting edge graphics processor programming. To understand this, you will need to know at least the following topics from the course prerequisites:


You'll get better grades by attending class, doing homework, and understanding the material than by cramming before the exam. Your overall grade comes from:

  1. HW: Homeworks and machine problems, to be distributed through the semester.

  2. PROJ: two research papers, with detailed citations, peer review, and in-class presentations.  These can be either survey papers summarizing work other people have done, or research papers summarizing the results from something interesting you have done.

  3. MT: Midterm Exam

  4. FINAL: Final Exam (comprehensive)

Your overall score is then calculated as:

GRADE = 20% HW + 40% PROJ + 20% MT + 20% FINAL

This percentage score is transformed into a plus-minus letter grade via these cutoffs: A >= 93%; A- 90%; B+ 87%; B 80%; C+ 77%; C 73%; C- 70%; D+ 67%; D 63%; D- 60%; F. The grades “B-”, “F+”, and “F-” will not be given. “A+” is reserved for truly extraordinary work.

Note that each research paper has the same grading weight as an entire exam!

At my discretion, I may round your grade up if it is near a grading boundary. Homeworks are due at midnight on the day they are due. Late homeworks will receive no credit. At my discretion, I may allow late work without penalty when due to circumstances beyond your control. Projects that are up to two weeks late may be accepted at a 50% grade penalty (e.g., on-time grade: 86%; late grade: 43%). Everything you turn in must be your own work--violations of the UAF Honor code will result in a minimum penalty equal to THAT ENTIRE SECTION OF YOUR GRADE (e.g., one plagiarized homework question will negate an otherwise perfect grade on all homeworks). However, even substantial reuse of other people's work is fine (and not plagiarism) if it is clearly cited; you'll be graded on what you've added to others' work. Group projects (NOT homeworks) are acceptable iff you clearly label who did what work; but I do expect a two-person group project to represent twice as much work as a one-person project. Department policy does not allow tests to be taken early; but in extraordinary circumstances may be taken late. 


First day of class: Thursday, January 19.  Last day to drop: Friday, February 3.  Spring break: March 12-16. Midterm exam: Thursday, March 8.  Last day to withdraw: Friday, March 23.  Last day of class: Thursday, May 3. Final exam: 10:15am-12:15pm Tuesday, May 8.

Course Topics (Tentative)


Physical Parallelism

  • Small circuit simulation with logisim

  • CPU design & pipelining

Superscalar Parallelism

  • Operand forwarding (register file bypass)

  • Pipeline hazards and data dependencies

  • Superscalar execution (wide issue)

  • Out-of-order execution

  • Branch prediction & speculation


Vector Parallelism


  • SSE branch instructions

  • SSE performance tuning

GPU Programming and Performance

  • CUDA and non-graphics code on the GPU: GPGPU

  • Kernel startup cost

  • Coherence and Divergence

  • EPGPU (Dr. Lawlor research!)

Early March:
Project 1 presentations
Review for midterm
Spring Break

Late March: (after spring break)

Multicore Parallelism

  • SMP, SMT, multicore hardware

  • Shared-memory programming with threads, OpenMP, Intel TBB

  • Locks and race conditions


Speeding Up Memory

  • Cache hardware design, thrashing

  • Cache hit ratio, performance modeling

  • Cache coherence in a multicore world

  • False sharing and multicore cache thrashing

Distributed-memory Parallelism

  • Fork & mmap on multicore machine

  • Clusters, MPP, and cloud computing

  • Clients, Servers, and Peer-to-Peer

  • Network interfacing via sockets

  • MPI, the Message Passing Interface


Project 2 presentations
Review for final exam