120 likes | 284 Views
Branch Predictor Design for AE64000. Lynn Choi Department of Electronics and Computer Engineering Korea University lchoi@korea.ac.kr Session: 5D Paper: 8. Motivation. Demand for high performance embedded processors ㅡ High-end embedded applications ㅡ Many uses of embedded processors
E N D
Branch Predictor Design for AE64000 Lynn Choi Department of Electronics and Computer Engineering Korea University lchoi@korea.ac.kr Session: 5D Paper: 8
Motivation • Demand for high performance embedded processors ㅡHigh-end embedded applications ㅡMany uses of embedded processors • Addition of a branch predictor ㅡTo achieve higher performance ㅡThe most cost-effective method
AE64000 Characteristics • IFU to minimize performance decrease caused by LERI’s • Additional two pipeline stages (IFU1+IFU2) to eliminate LERI’s • 3 line buffers to store 12 instructions • PrePC in IFU and PC in the pipeline core • Branch misprediction penalty • Branch misprediction penalty : 3 cycles
Branch Predictor Design for AE64000 • Issues in branch predictor design for AE64000 • AE64000 has additional two stages (IFU1-IFU2) in front of 5-stage pipeline core. At which pipeline stage prediction should be performed? IFU1 stage • Due to line buffers in the IFU, predicted target addresses need to be buffered as well to verify branch prediction results need buffers for predicted branch target addresses (PTAB) • Since 4 instructions are fetched at a time, multiple branches can be fetched at a time as well. Only the first taken branch will be predicted. To do that, TAC has the precise target address. • Branch misprediction penalty • Can be reduced from 3 to 2 cycles by updating PPC at the same cycle that PC is updated by adding a MUX in the IFU
Branch Predictor For AE64000 • Separate BPT with TAC • PTAB to store predicted target address for instructions in the line buffer • Branch prediction verification in the ID stage
Predicted Target Address Buffer • Predicted Target Address Buffer (PTAB) • For branch instructions in the line buffer • When we send a branch instruction to the pipeline core, we also send the corresponding predicted target address
Simulation Environment • Developed a cycle-accurate AE64000 simulator • Simulated 1 billion instructions • 30 minutes on P4 1.6GHz with 512MB RAM • Indirect branches are not predicted in the simulation • Input: AE64000 compiler binary, memory & predictor configuration parameters • Output: IPC, BPT/TAC hit ratios, etc. • Benchmark • SPECint95 (compress, go) • Dhrystone • Whetstone • Predictors tested • Last-time predictor • Bimodal predictor • G-share predictor Simulator Block Diagram
Simulation Results • Without branch predictor (IPC)
Simulation Results • Last-time branch predictor
Simulation Results (cont’d) • Bimodal Branch Predictor
Simulation Results (cont’d) • G-share Branch Predictor
Conclusion • Simulation result analysis • Consider both performance and area • The additional performance gain by g-share and bimodal predictors are negligible compared to their size and complexity. • Final design • Last-time predictor with 4-way set-associative 8-entry TAC with LRU replacement • IPC is improved 10% by reducing the branch prediction penalty from 3 to 2 cycles • Additional 15% IPC improvement by branch predictor • About 11500 gate (about 2.64% area) in Verilog HDL model • Thus, we can improve the performance of AE64000 by 25% with less than 3% cost