160 likes | 295 Views
Lecture 15: Dynamic Scheduling Denouement. Today. Branch prediction Static Direction Target Compaq Alpha 21264 Case Study PowerPC 620. Depending on use, some branches are very predictable loops TTT…TN limit checks almost always pass Some are not very predictable
E N D
Lecture 15: Dynamic Scheduling Denouement Today Branch prediction Static Direction Target Compaq Alpha 21264 Case Study PowerPC 620
Depending on use, some branches are very predictable loops TTT…TN limit checks almost always pass Some are not very predictable data dependent dispatch with equally likely cases Types of predictors static history multi-bit history pattern for(j=0;j<30;j++) { … } switch(mode) { case 1: … case 2: … default: … … if(a > limit) { … } Branch Prediction
Assign a preferred direction to each branch e.g., BNEZ_T (predict taken) BNEZ_N (predict not taken) Base on program analysis loops tend to be taken profiling of the program but it may be data dependent Static Prediction A>B?
Branch history table indexed by IP stores last direction each branch went may indicate if last instruction at this address was a branch table is a cache of recent branches Buffer size of 4096 entries are common What happens if: Don’t find IP in BHT? Run out of BHT entries? Dynamic Predictors IP IR IM BHT Prediction
A ‘predict same as last’ strategy gets two mispredicts on each loop Predict NTTT…TTT Actual TTTT…TTN Can do much better by adding inertia to the predictor e.g., two-bit saturating counter Predict TTTT…TTT Miss rate: 4% (FP) to 11% (int) for(j=0;j<30;j++) { … } Multi-bit predictors N2 N1 T1 T2
More Complex Behavior • Pattern is TNNTNNTNN • Correlated branches BNEZ R1,_L1 ; b1 (d!=0) ADDI R1,R0,#1 _L1: SUB R3,#1,R1 BNEZ R3,_L2 ; b2 (d!=1) _L2: if(d==0) d=1; if(d==1) b1 and b2 are highly correlated! • What happens when d = 2,0,2,0,2… • 1 bit predictor • 2 bit predictor
Capture Global Patterns BNEZ R1,_L1 ; b1 (d!=0) ADDI R1,R0,#1 _L1: SUB R3,#1,R1 BNEZ R3,_L2 ; b2 (d!=1) _L2:
History gives a pattern of recent branches e.g., TTNTTNTTN what comes next? Predict next branch by looking up history of branches for a particular pattern Two-level predictor first level - find history (pattern) 2nd level - predict branch for that pattern Correlating predictors Branch Pattern Tables (Two-Level Predictors) BHT BPT IP f State 110110 Prediction BPT may be Independent for each BHT entry or shared 110110
Local: previous executions of this branch Global: previous execution of all branches Compaq Alpha 21264 Branch Predictor (1998) Local HistoryTable(1024x10) Local Prediction(1024x3) Global Prediction(4096x2) IP Choice Prediction(4096x2) Path History prediction
Need to know where to go if the prediction is ‘taken’ predict the target along with the direction May use different target prediction strategy for different types of branches subroutine returns F F R R A A M M W W Branch Target Tables Predict Taken Calculate Target Need to guess target here
Use current IP to index a cache of next IPs Use a push-down stack to record subroutine return addresses The ISA can give hints about where you’re going Digital Alpha has 4 instructions with identical ISA behavior JMP, JSR, RET, JSR_COROUTINE specify predictor’s use of stack include hint of target address JMP R31, (R3), hint Branch Target Prediction (2) Actual Target IP (from ALU) IP +4 IM IR BHT Prediction (Taken/Not) BTB Predicted Target IP Stack
Branch Performance Consider a modern pipeline with a long decode stage F D1 D2 D3 R T A1 ... Predict Discover its a branch Resolve direction and calculate target Penalty for mispredicted branch is _____ If 10% of instructions are branches what is CPI With no prediction? With 70% accurate prediction (static) With 85% accurate prediction (2-bit) With 95% accurate prediction (2-level)
Predication guard instructions with predicate, cancel if false when is this a good idea? What length conditional segment? Delay slots make branch delay explicit exposes implementation in the ISA makes life difficult for future implementations compiler tries to fill delay slot with useful instruction (possibly predicated) ISA support hints to the predictor let the compiler pass along the information it has separate the components of the branch target address calculation prepare to branch (Tera) determining direction actually branching Alternatives to Prediction
Fetch/Issue/Complete up to 4 instructions per clock 2 simple integer units 1 complex IU (imul/idiv) 1 ld/st unit 1 FPU 1 Branch unit Load/Store Internal effective address ALU 1 cycle iLD, 2 cycle fpLD L/S buffer for memory reordering Multiple memory ops in flight Complex IU (MCFXU) 3-20 cycles Some ops pipelined, Div not FPU 2 cycles (pipelined) for MUL, ADD, MULA 31 cycles (unpipelined) for FDIV 5-7 stage pipe Case Study - PowerPC 620 F D I X C
Out-of-order instruction issue Fetch 4 instructions/cycle Register renaming 8 extra I-Regs 12 extra F-Regs Reservations station Does reordering Bypassing into stations Commit (reorder buffer) Flushed on mis-speculation Instruction Issue Ideal IPC Fetch IPC Commit IPC Issue IPC Execute IPC
Next Time • A little fun…..