Consider the following MAL code to copy an array in memory:
Loop: beq $10, $11, End nop lw $12, 0($10) sw $12, 0($14) addi $14, $14, 4 addi $10, $10, 4 b Loop nop End:
There is a data dependency between the lw and sw instructions and no-op instructions have been inserted in the delay slots following the beq and b instructions.
The code may be reordered by moving the addi instructions into the delay slots following the branch instructions. For example, the first addi could be moved to fill the delay slot after the b instruction. By moving the second addi instruction between the lw and sw instructions, the load/store data dependency may be eliminated.
The delay slot following the beq is more difficult to fill. There are no other instructions in the loop which can be reordered to fill the delay slot without introducing new data dependencies. An addi instruction on register $0 is added to the delay slot to perform a no-op.
Loop: beq $10, $11, End addi $0, $0, 0 lw $12, 0($10) addi $10, $10, 4 sw $12, 0($14) b Loop addi $14, $14, 4 End:
The reordered code loop contains 7 instructions and no data dependencies. The original code, including delay slots, requires 8 instructions and contains a data dependency which would add 2 extra clock cycles per iteration. Therefore, the reordering provides a 30% reduction in the execution time of the loop.