190 likes | 489 Views
Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline. Debajit B h attacharya Ali JavadiAbhari. ELE 475 Final Project 9 th May, 2012. Outline. Motivation Branch Prediction Simulation Setup & Testing Methodology Dynamic Branch Prediction Single Bit Saturating Counter
E N D
Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9th May, 2012
Outline • Motivation • Branch Prediction • Simulation Setup & Testing Methodology • Dynamic Branch Prediction • Single Bit Saturating Counter • Two Bit Saturating Counter • Two Level Local Branch History & Single Bit Prediction • Two Level Local Branch History & Two Bit Prediction • Comparison of Performances • Conclusion • Future Work
Why Branch Prediction? • Branches (Conditional & Un-conditional) redirect the stream of instructions – results in dead cycles in the front-end • Branch Cost increases with – • Super-pipeline – delays the branch resolution • e.g. Pentium 3 & 4 have 10 and 20 cycles penalty respectively • Super-scalar – multiplies the dead instructions • e.g. 6-stage MIPS pipe has 3 and 7 dead instructions in their one way and two way implementations respectively
Branch Prediction • Minimizes the dead cycles generated by a “taken” branch • Essential in modern processors to restore the IPC • Two components of prediction – • Direction/Outcome of branch (applies to conditional branches only) • Target of branch (applies to all branches)
Simulation Setup & Testing Methodology • 5 Stage MIPS pipeline • Parcv2 instruction set • Pv2byp – configuration from Lab • Own Assembly Test • Micro-benchmarks from Lab • Vector-vector Add • Complex Multiply • Binary Search • Masked Filter
Pv2Byp Pipeline F D X M W Target address of J and JAL known at D stage Target address of JR and JALR known at X stage Branch direction/outcome known at X stage
Dynamic Branch Prediction • Performance = f(accuracy, cost of misprediction) • One Level Predictor – Bimodal Prediction • Branch History Table • Branch Target Buffer • Two level Predictor • Branch History Register Table • Pattern History Table • Branch Target Buffer • All the tables are read at the F stage for prediction • All the tables are written in either D or X stage (depending on the resolution of the branch and correctness of prediction
Hardware Description • BHT • Indexed by the lower <bht_IndexSize> bits of PC • Holds the prediction bit(s) (1 or 2) • BHR • Indexed by lower <bhr_IndexSize> bits of PC • Holds the local branch history <pht_IndexSize> bits • PHT • Indexed by entries of BHR <pht_IndexSize> bits • Holds the prediction bit(s) • BTB • Indexed by lower <btb_IndexSize> bits of PC • Holds the rest of the bits of PC as tag • Holds the branch target PC • Holds a valid bit for two level predictor
Hardware Description BHT BTB 0..0 0..0 0..1 0..1 PC[bht_IndexSize+1:2] 1..1 1..1 PC[btb_IndexSize+1:2] BTB Hit = PC[31:btb_IndexSize+2]
One Bit Saturating Counter NT Predict T Predict NT T NT T Exploits Temporal Correlation between two states – T and NT Always two mispredicts in a backward branch loop
Two bit Saturating Counter Weak Not taken Weak Taken Strong Not taken NT NT NT Strong Taken Predict T Predict NT Predict NT Predict T NT T T T T Needs two consecutive T/NT to change prediction state Tolerates one branch going unusual direction, still predicts next branch correctly Works better than One bit Counter in a nested loop
Two level Branch Predictor [Yeh & Patt, ’92] PHT BHR 000..00 index 111..01 Pattern History Bit(s) 111..11 FSM Logic Branch Result from X stage Prediction Bit Many branches execute repetitive patterns Local/Current branch history patterns Requires Initial settling of counter values
Effect of BTB Size 1 Level 2 Bit
Effect of PHT Size 2 Level 2 Bit
Conclusion Predictor Size – Hardware Cost – Better Prediction Accuracy Larger BHTs – Smaller BTBs – Reduces Hardware cost – Reuses branch history even if the entry is not present in BTB Smaller BHTs – Multiple branches alias – degraded prediction All branches reach unique BHT entry – Accuracy saturates BHR width must capture the repetitive pattern in two level predictor – Otherwise performs worse than bimodal scheme
Future Work Global Branch Prediction – Data dependent correlation – nested loops Gshare and Gselect Extending to two way superscalar – Pv2ssc
Thank You! Q & A