150 likes | 364 Views
Lecture 11: Pipeline Hazards II. Last Time: Data Hazards (intro) Today Reducing impact of data hazards Control Hazards How to speed up pipelines. Data Hazards With Bypassing. Cycle. ADD R1, R2, R3 ADD R4, R1, R5 SUB R5, R1, R6 XOR R7, R8, R1. F. R. X. M. W. R1 computed. R1 used.
E N D
Lecture 11: Pipeline Hazards II Last Time: Data Hazards (intro) Today Reducing impact of data hazards Control Hazards How to speed up pipelines
Data Hazards With Bypassing Cycle ADD R1, R2, R3 ADD R4, R1, R5SUB R5, R1, R6 XOR R7, R8, R1 F R X M W R1 computed R1 used F R X M W ADD Instruction F R X M W SUB F R X M W XOR
Memory Data Hazards Cycle LW R1, R2 ADD R4, R1, R5SUB R7, R8, R9 F R X M W R1 loaded R1 used F R X M W ADD Instruction F R X M W SUB
Instruction Scheduling (Load Delay Slots) Cycle LW R1, R2 SUB R7, R8, R9 ADD R4, R1, R5 F R X M W R1 loaded F R X M W SUB Instruction R1 used F R X M W ADD
Control Hazards (Unconditional Branch) Cycle F R X M W Branch Test and Destination F R X M W Instruction Need Destination Here JR R25 ... XX: ADD ...
Reducing Control Hazards Cycle F R X M W Branch Test and Destination F R X M W Instruction Need Destination Here JR R25 ... XX: ADD ... Move test logic into R stage
Branch Delay Slots - A Familiar Example .LL5: sll %o1,2,%g3 // i*4 bytes ld [%o0],%g2 // load A[i] add %o1,1,%o1 // i++ ld [%o2+%g3],%g3 // load B[i] cmp %o1,99 // at end? add %g2,%g3,%g2 // A[i]+B[i] st %g2,[%o0] // store A[i] ble .LL5 // branch back add %o0,4,%o0 // A++ retl // return from pgm sub %sp,-912,%sp // deallocate stack space
Branch Delay Slots • Since we need to have a dead cycle anyway, let’s put a useful instruction there • Advantage: • Do more useful work • Disadvantage: • Exposes microarchitecture to ISA ADD R2,R3,R4BNEZ R5,_loop NOP BNEZ R5,_loop ADD R2,R3,R4
Conservatively, the pipeline waits until the branch target is computed before fetching the next instruction. Alternatively, we can speculate which direction and to what address the branch will go. Need to confirm speculation and back up later. F F F R R R X X X M M M W W W F F R R X X M M W W Control Hazards F F R X M W
Example Speculative Conditional Branch BNEZ R1, LOOP ADD R2, R3, R4 SUB R5, R6, R7 IP IP +4 RW A IR Reg File I-Mem D-Mem C E A A DO DO B D DI IR IR IR
Speculative Conditional Branch (Diagram) Cycle BNEZ R1, LOOP ADD R2, R3, R4 SUB R5, R6, R7 F R X M W Condition and Dest Available Here F R X M W Instruction F R X M W Speculate Not Taken Confirm or Branch
Reg Reg Data Memory R4000 Pipeline (I$ start) (I$ finish) (decode/opfetch) (ALU) (D$ start) (D$ finish) (tag check) (write back) IF IS RF EX DS DF TC WB Instruction Memory • How long is load delay? • How long is branch delay? • How many comparators are needed to implement the forwarding decisions? • What instruction sequences will still cause stalls?
How Do We Speed up the Pipeline? • Pipeline too long more ALUs (exploit ILP) • WAR/WAW hazards register renaming • Undetermined dependencies at compile time dynamic scheduling • Object code compatibility • Simplify compiler • Too many branches better branch prediction • Or use predication to eliminate branches • Unknown dependencies (control/data) speculate • Explicitly parallel architectures (EPIC) ADD R1,R2,R3 SUB R1,R4,R5 ADD R1,R2,R3 SUB R1’,R4,R5
Summary • Today: • Reducing impact of data hazards • Control Hazards • How to speed up pipelines • Next Time • ILP • Dynamic Scheduling