| T-I-D-A | Description | |
| Sequential Programming Model | ||
| 1-1-1-1: Sequential | A sequential CPU reads a single instruction, pulls its data, and executes it. This is the model programmers have been using since 1948. | |
| 1-p-p-1: Pipelined | Here we've split the CPU's execution path into p pipeline stages. You need at least p instructions, and at least p independent data items. Only one arithmetic unit actually completes a result each clock cycle, though. The point of pipelining a CPU is to do the fetching, decoding, and operand reads in parallel with actual execution. Typical pipelines are 3 to 30 stages long, with a current typical value of about 10 stages. | |
| 1-k-k-k: Superscalar | A superscalar CPU simultaniously
fetches a set of k instructions, reads all their data, and executes
them all at once.  This only
works if all k instructions and data are utterly independent, which is
rare, so k is typically 2 to 4 for real CPUs.  In a search for
more independent instructions, superscalar machines typically need register renaming, out-of-order execution, and sophisticated branch prediction. | |
| Multithreaded Programming Model | ||
| s-1-s-s: SMP (Multicore) | This is just s replicated copies of a single CPU.  Unlike superscalar, multicore requires the programmer to specify s independent execution threads, but the benefit is the CPU doesn't need to do dependency analysis, so s can reach into dozens or even hundreds. | |
| h-1-h-1: SMT (Hyperthreading) | Here we have h
replicated registers and decoders, but they all share a single set of
arithmetic units.  This has the same programming model as
multicore, but the hardware is cheaper. | |
| SIMD Programming Model | ||
| 1-1-m-m: SIMD | Single Instruction Multiple Data:
the programmer issues a single instruction, like "addps", and it runs
several data items through a set of arithmetic units.  The
advantage is fewer instructions, which means less work fetching and
decoding. | |
| 1-1-v-1: Vector | Vector machines,
such as the Cray Y-MP, could add vector registers with a single
instruction.  But unlike full SIMD, the hardware only had one
arithmetic circuit, so the vector's values had to go in one at a
time.  Again, this is the same programming model as SIMD, but the
hardware is cheaper. | |
| Single threaded | ||||||||||
| One Instruction at a time | 
 | |||||||||
| Many Instructions simultaniously | 
 |