|Sequential Programming Model
|A sequential CPU reads a single instruction, pulls its data, and executes it. This is the model programmers have been using since 1948.|
|Here we've split the CPU's execution path into p pipeline stages. You need at least p instructions, and at least p independent data items. Only one arithmetic unit actually completes a result each clock cycle, though. The point of pipelining a CPU is to do the fetching, decoding, and operand reads in parallel with actual execution. Typical pipelines are 3 to 30 stages long, with a current typical value of about 10 stages.|
|A superscalar CPU simultaniously
fetches a set of k instructions, reads all their data, and executes
them all at once. This only
works if all k instructions and data are utterly independent, which is
rare, so k is typically 2 to 4 for real CPUs. In a search for
more independent instructions, superscalar machines typically need register renaming, out-of-order execution, and sophisticated branch prediction.
|Multithreaded Programming Model
|This is just s replicated copies of a single CPU. Unlike superscalar, multicore requires the programmer to specify s independent execution threads, but the benefit is the CPU doesn't need to do dependency analysis, so s can reach into dozens or even hundreds.
|Here we have h
replicated registers and decoders, but they all share a single set of
arithmetic units. This has the same programming model as
multicore, but the hardware is cheaper.
|SIMD Programming Model
|Single Instruction Multiple Data:
the programmer issues a single instruction, like "addps", and it runs
several data items through a set of arithmetic units. The
advantage is fewer instructions, which means less work fetching and
such as the Cray Y-MP, could add vector registers with a single
instruction. But unlike full SIMD, the hardware only had one
arithmetic circuit, so the vector's values had to go in one at a
time. Again, this is the same programming model as SIMD, but the
hardware is cheaper.
at a time