||Solve with operand forwarding (register file bypass)
||Predict or flush
||May have poor branch behavior
||Breaking up the pipeline stages so they take approximately equal time.
||Solve WAW/WAR with register renaming;
Tolerate RAW with out-of-order execution.
|Predict or die!
||May not have enough instruction-level parallelism
||Keeping all execution units busy (e.g., big instruction window)
|SIMD||Between instructions: see above.
Within a register: need data shuffling.
Take both branches, mux into one answer. Branch locality is important.
||Needs to be rewritten to use weird SIMD datatypes and branches. See Intel's ArBB (or NVIDIA CUDA) for automatic translation.
||Not a problem, since data-parallel is automatically balanced|
||Solve WAW/WAR with privatization;
Tolerate RAW with locks or atomics.
|Not a problem, since cores can branch independently.
||May have multithread shared data problems, making it difficult to run correctly in multicore.
||Keeping all the threads busy (e.g., dynamic scheduling)