CS 441 Project 0
Timeline & Deliverables
Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17 <- Project 0 topic (in class)
18 19 20 21 22 23 24 <- Project 0 rough draft
25 26 27 28
Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10 <- Project 0 presentations
11 12 13 14 15 16 17 <- Spring break
18 19 20 21 22 23 24 <- Project 0 final draft
25 26 27 28 29 30 31
I'd like you to describe in class your project topic.
We do these out loud so everybody can hear each other's
ideas, making it easier to form group projects if you'd like to
I'd like your rough draft code,
which should work and do most of what
you want, but not necessarily do everything you want to do, or
be fully polished or tuned.
The presentation is
a short, 10-minute presentation in class (time
yourself beforehand!). Your presentation should
clearly describe WHO you are, WHAT you did, HOW you did it, and
WHY you chose to do it that way. Bring a laptop to project
your code, demo, slides, and/or figures, or email me your
presentation materials the day before, if you'd like to present
from my laptop.
The final code should
be fully debugged, polished, tuned, commented, and include at
least a short README explaining what it is, and what its results
mean. You'll be graded on a combination of ambition,
correctness, completeness, and comments/style. Style and
clean code count! (This is scheduled well after the
presentation, so you can follow up any suggestions or ideas you
get during the presentation, and to get us past spring break.)
grade breakdown: project grade = 25% rough draft + 25%
presentation + 50% final code
- Learn about the rationale, history, and
advantages/disadvantages of any current deep hardware topic, such as:
- Pipelining, especially the very deep pipelines of the
Pentium 4 compared to less deep more recent Core i7 pipelines.
- Out-of-order execution
- Register renaming
- Branch prediction, branch history, and execution speculation
- Cache prefetching and out-of-order loads and stores
- SIMD parallelism
- Multi-core, SMP, SMT parallelism
- Describe how the design limitations and goals of nonstandard
computing platforms differ from conventional computing, such as:
- High-performance computing systems, such as Blue Gene or
anything on the Top500
- Consumer game consoles, such as the PlayStation 4 or Xbox
- Embedded systems, such as cell phones or microwave ovens.
- Pick a hardware-related article from Ars Technica.
Explain what they're talking about in detail.
- Pick a CPU architecture from sandpile.org. Compare
this architecture's hardware design, in terms of achievable
performance, with competing architectures.
- Describe performance counters, which are useful for
understanding code performance and pipelining (see PCL)
- Describe a strange fabrication substrate or nonstandard
computing scheme, such as Biological Computing, Quantum
Computing, self-organizing polymer nanofabrication, etc.
- Describe a new or novel data storage architectures, such as
perpendicular bit recording, MLC flash, magnetoresistive memory,
or nanowire memory.
- Describe a semiconductor or PCB fabrication process in detail,
such as the problems encountered during as we approach nanometer
photolithography, solutions such as extreme UV lithography, or
the interelationship between planarization and metal layers in
- Describe the historical evolution of some computer
architecture, such as SPARC or
- Explore the decline and fall of some computer architecture,
such as the 1960's Burroughs
B5000, or the
early 1980's VAX ("All
the world's a VAX!" Or, er, it was...), or the desktop PC.
- Build an interesting circuit: extend your HW1 CPU, build a
superscalar dependency detection unit, etc.
- Hardware performance analysis: benchmark some test programs
that demonstrates some aspect of modern hardware, such as:
- Out-of-order execution (e.g., reorder instructions manually,
compare to automatic reordering)
- Branch prediction and execution speculation (e.g.,
reverse-engineer x86 branch hardware, like compare
always-taken branch performance with even-odd branch
- Dependency tracking (e.g., benchmark performance benefit
from decreasing dependency tree depth)
- Cache prefetching and out-of-order loads and stores (e.g.,
compare cached loads with cached loads matching a previous
- Define a new instruction set, with a software or circuit
- Write and benchmark some code to perform any interesting task
quickly on a particular architecture:
- Use bitwise
operations to do something simple faster, or do
something simple in a fiendishly complex way.
- Use assembly language or your knowledge of branch
prediction, caching, etc to improve the performance of some
- Write a dynamic
binary translator for any architecture.
- Use SSE or AVX instructions to speed up some code with the
power of SIMD.
- Use OpenMP or pthreads to speed up some code with the power
of multicore. (But you must get the right answer!)
- Using MPI or sockets to speed up code with the power of
- Use CUDA or OpenCL to speed up some code with the power of
code can be something completely new, something you found on the
net (with a citation), an extension of any homework, example from
the lecture notes, etc.