CS 441 Project 0
Project
Timeline & Deliverables
February 2018
Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17 <- Project 0 topic (in class)
18 19 20 21 22 23 24 <- Project 0 rough draft
25 26 27 28
March 2018
Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10 <- Project 0 presentations
11 12 13 14 15 16 17 <- Spring break
18 19 20 21 22 23 24 <- Project 0 final draft
25 26 27 28 29 30 31
First,
I'd like you to describe in class your project topic.
We do these out loud so everybody can hear each other's
ideas, making it easier to form group projects if you'd like to
do so.
Next,
I'd like your rough draft code,
which should work and do most of what
you want, but not necessarily do everything you want to do, or
be fully polished or tuned.
The presentation is
a short, 10-minute presentation in class (time
yourself beforehand!). Your presentation should
clearly describe WHO you are, WHAT you did, HOW you did it, and
WHY you chose to do it that way. Bring a laptop to project
your code, demo, slides, and/or figures, or email me your
presentation materials the day before, if you'd like to present
from my laptop.
The final code should
be fully debugged, polished, tuned, commented, and include at
least a short README explaining what it is, and what its results
mean. You'll be graded on a combination of ambition,
correctness, completeness, and comments/style. Style and
clean code count! (This is scheduled well after the
presentation, so you can follow up any suggestions or ideas you
get during the presentation, and to get us past spring break.)
Typical
grade breakdown: project grade = 25% rough draft + 25%
presentation + 50% final code
Example
Project Topics
Research
projects:
- Learn about the rationale, history, and
advantages/disadvantages of any current deep hardware topic, such as:
- Pipelining, especially the very deep pipelines of the
Pentium 4 compared to less deep more recent Core i7 pipelines.
- Out-of-order execution
- Register renaming
- Branch prediction, branch history, and execution speculation
- Cache prefetching and out-of-order loads and stores
- SIMD parallelism
- Multi-core, SMP, SMT parallelism
- Describe how the design limitations and goals of nonstandard
computing platforms differ from conventional computing, such as:
- High-performance computing systems, such as Blue Gene or
anything on the Top500
list.
- Consumer game consoles, such as the PlayStation 4 or Xbox
One.
- Embedded systems, such as cell phones or microwave ovens.
- Pick a hardware-related article from Ars Technica.
Explain what they're talking about in detail.
- Pick a CPU architecture from sandpile.org. Compare
this architecture's hardware design, in terms of achievable
performance, with competing architectures.
- Describe performance counters, which are useful for
understanding code performance and pipelining (see PCL)
- Describe a strange fabrication substrate or nonstandard
computing scheme, such as Biological Computing, Quantum
Computing, self-organizing polymer nanofabrication, etc.
- Describe a new or novel data storage architectures, such as
perpendicular bit recording, MLC flash, magnetoresistive memory,
or nanowire memory.
- Describe a semiconductor or PCB fabrication process in detail,
such as the problems encountered during as we approach nanometer
photolithography, solutions such as extreme UV lithography, or
the interelationship between planarization and metal layers in
CMOS fabrication.
- Describe the historical evolution of some computer
architecture, such as SPARC or
Motorola's 68000.
- Explore the decline and fall of some computer architecture,
such as the 1960's Burroughs
B5000, or the
early 1980's VAX ("All
the world's a VAX!" Or, er, it was...), or the desktop PC.
Applied
projects:
- Build an interesting circuit: extend your HW1 CPU, build a
superscalar dependency detection unit, etc.
- Hardware performance analysis: benchmark some test programs
that demonstrates some aspect of modern hardware, such as:
- Out-of-order execution (e.g., reorder instructions manually,
compare to automatic reordering)
- Branch prediction and execution speculation (e.g.,
reverse-engineer x86 branch hardware, like compare
always-taken branch performance with even-odd branch
performance)
- Dependency tracking (e.g., benchmark performance benefit
from decreasing dependency tree depth)
- Cache prefetching and out-of-order loads and stores (e.g.,
compare cached loads with cached loads matching a previous
store)
- Define a new instruction set, with a software or circuit
simulator.
- Write and benchmark some code to perform any interesting task
quickly on a particular architecture:
- Use bitwise
operations to do something simple faster, or do
something simple in a fiendishly complex way.
- Use assembly language or your knowledge of branch
prediction, caching, etc to improve the performance of some
program.
- Write a dynamic
binary translator for any architecture.
- Use SSE or AVX instructions to speed up some code with the
power of SIMD.
- Use OpenMP or pthreads to speed up some code with the power
of multicore. (But you must get the right answer!)
- Using MPI or sockets to speed up code with the power of
clustering.
- Use CUDA or OpenCL to speed up some code with the power of
the GPU.
Your starting
code can be something completely new, something you found on the
net (with a citation), an extension of any homework, example from
the lecture notes, etc.