The steps must be performed in sequence for each instruction. In a pipelined CPU, each step is implemented as an independent stage in an assembly line process.
A single instruction enters the CPU at the Fetch stage and the PC is incremented in one clock cycle. In the next clock cycle, the instruction moves to the Decode stage. In the third clock cycle, the instruction moves to the Access stage and the operands are loaded. In the last two stages, the instruction is executed and the result is stored.
In a five stage pipeline a single instruction will take 5 clock cycles to pass through the pipeline as shown below:
Since the pipeline stages operate independently, a new instruction may enter the Fetch stage as soon as the add instruction has moved to the Decode stage. If a sw follows the add, the pipeline will appear as follows:
Note that while it requires 5 clock cycles to complete the first add instruction, the sw instruction is completed on the next clock cycle.
Under ideal circumstances, a pipelined processor can produce a result on every clock cycle. Thus, the peak MIPS (Millions of Instructions Per Second) rating of the CPU equals the clock speed in Mhz.
A pipelined CPU achieves maximum throughput only when all stages of the pipeline are filled with instructions which can be processed independently. Performance decreases when gaps or holes appear in the pipeline. A hole is an empty pipeline stage which is not processing an instruction.