CS 441 Project 1
the syllabus, each of the two course projects is 15% of your
course grade, so it should have some pretty good stuff.
Conversely, the total end-to-end time for the project is only a
few weeks, so keep it manageable!
Su Mo Tu We Th Fr Sa
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28 <- Project presentations
Su Mo Tu We Th Fr Sa
1 2 3 4 5 <- Final exam and final project draft due
The presentation is
a very short, 7-minute presentation in class (time
yourself beforehand!). Your presentation should
clearly describe WHO you are, WHAT you did, HOW you did it, and
WHY you chose to do it that way. Bring a laptop to project
your code, demo, slides, and/or figures, or email me your
presentation materials the day before, if you'd like to present
from my laptop.
The final version
should be fully debugged, polished, tuned, commented, and
include at least a short README explaining what it is, and what
its results mean. You'll be graded on a combination of
ambition, correctness, completeness, and comments/style.
Style and clean formatting count! (This is scheduled well
after the presentation, so you can follow up any suggestions or
ideas you get during the presentation.)
grade breakdown: project grade = 30% presentation + 70% final
- Learn about the rationale, history, and
advantages/disadvantages of any current hardware topic, such as:
- Pipelining, especially the very deep pipelines of the
- Out-of-order execution
- Register renaming
- Branch prediction, branch history, and execution speculation
- Cache prefetching and out-of-order loads and stores
- Multi-core, SMP, SMT, or SIMD parallelism (pick one!)
- Software defined networking (OpenFlow)
- Describe how the design limitations and goals of nonstandard
computing platforms differ from conventional computing, such as:
- High-performance computing systems, such as Blue Gene or
anything on the Top500
- Consumer game consoles, such as the PlayStation 4 or Xbox
- Embedded systems, such as cell phones or microwave ovens.
- Pick a hardware-related article from Ars Technica.
Explain what they're talking about in detail.
- Pick a CPU architecture from sandpile.org. Compare
this architecture's hardware design, in terms of achievable
performance, with competing architectures.
- Describe performance counters, which are useful for
understanding code performance and pipelining (see PCL)
- Describe a strange fabrication substrate or nonstandard
computing scheme, such as Biological Computing, Quantum
Computing, self-organizing polymer nanofabrication, etc.
- Describe a new or novel data storage architectures, such as
perpendicular bit recording, MLC flash, magnetoresistive memory,
or nanowire memory.
- Describe a semiconductor or PCB fabrication process in detail,
such as the problems encountered during as we approach nanometer
photolithography, solutions such as extreme UV lithography, or
the interelationship between planarization and metal layers in
- Describe the historical evolution of some computer
architecture, such as SPARC or
- Explore the decline and fall of some computer architecture,
such as the 1960's Burroughs
B5000, or the
early 1980's VAX ("All
the world's a VAX!" Or, er, it was...), or the desktop PC.
- Build an interesting circuit: extend your HW1 CPU, build a
superscalar dependency detection unit, etc.
- Hardware performance analysis: benchmark some test programs
that demonstrates some aspect of modern hardware, such as:
- Cloud benchmarking (e.g., how code timings change as a
function of how much you pay to rent that CPU)
- Out-of-order execution (e.g., reorder instructions manually,
compare to automatic reordering)
- Branch prediction and execution speculation (e.g.,
reverse-engineer x86 branch hardware, like compare
always-taken branch performance with even-odd branch
- Dependency tracking (e.g., benchmark performance benefit
from decreasing dependency tree depth)
- Cache prefetching and out-of-order loads and stores (e.g.,
compare cached loads with cached loads matching a previous
- Define a new instruction set, with a software or circuit
- Write and benchmark some code to perform any interesting task
quickly on a particular architecture:
- Use bitwise
operations to do something simple faster, or do
something simple in a fiendishly complex way.
- Use assembly language or your knowledge of branch
prediction, caching, etc to improve the performance of some
- Write a dynamic
binary translator for any architecture.
- Use SSE or AVX instructions to speed up some code with the
power of SIMD.
- Use OpenMP or pthreads to speed up some code with the power
of multicore. (Or you can look at verifying that the
code will always get the right answer!)
- Using MPI or sockets to speed up code with the power of
- Use CUDA or OpenCL to speed up some code with the power of
- Use the power of the cloud with sockets, web sockets, web services, MPI, MapReduce, etc
code can be something completely new, something you found on the
net (with a citation), an extension of any homework, example from
the lecture notes, etc.