Branch Predictor Design for AE64000

Branch Predictor Design for AE64000 Lynn Choi Department of Electronics and Computer Engineering Korea University lchoi@korea.ac.kr Session: 5D Paper: 8

Motivation • Demand for high performance embedded processors ㅡHigh-end embedded applications ㅡMany uses of embedded processors • Addition of a branch predictor ㅡTo achieve higher performance ㅡThe most cost-effective method

AE64000 Characteristics • IFU to minimize performance decrease caused by LERI’s • Additional two pipeline stages (IFU1+IFU2) to eliminate LERI’s • 3 line buffers to store 12 instructions • PrePC in IFU and PC in the pipeline core • Branch misprediction penalty • Branch misprediction penalty : 3 cycles

Branch Predictor Design for AE64000 • Issues in branch predictor design for AE64000 • AE64000 has additional two stages (IFU1-IFU2) in front of 5-stage pipeline core. At which pipeline stage prediction should be performed?  IFU1 stage • Due to line buffers in the IFU, predicted target addresses need to be buffered as well to verify branch prediction results  need buffers for predicted branch target addresses (PTAB) • Since 4 instructions are fetched at a time, multiple branches can be fetched at a time as well.  Only the first taken branch will be predicted. To do that, TAC has the precise target address. • Branch misprediction penalty • Can be reduced from 3 to 2 cycles by updating PPC at the same cycle that PC is updated by adding a MUX in the IFU

Branch Predictor For AE64000 • Separate BPT with TAC • PTAB to store predicted target address for instructions in the line buffer • Branch prediction verification in the ID stage

Predicted Target Address Buffer • Predicted Target Address Buffer (PTAB) • For branch instructions in the line buffer • When we send a branch instruction to the pipeline core, we also send the corresponding predicted target address

Simulation Environment • Developed a cycle-accurate AE64000 simulator • Simulated 1 billion instructions • 30 minutes on P4 1.6GHz with 512MB RAM • Indirect branches are not predicted in the simulation • Input: AE64000 compiler binary, memory & predictor configuration parameters • Output: IPC, BPT/TAC hit ratios, etc. • Benchmark • SPECint95 (compress, go) • Dhrystone • Whetstone • Predictors tested • Last-time predictor • Bimodal predictor • G-share predictor Simulator Block Diagram

Simulation Results • Without branch predictor (IPC)

Simulation Results • Last-time branch predictor

Simulation Results (cont’d) • Bimodal Branch Predictor

Simulation Results (cont’d) • G-share Branch Predictor

Conclusion • Simulation result analysis • Consider both performance and area • The additional performance gain by g-share and bimodal predictors are negligible compared to their size and complexity. • Final design • Last-time predictor with 4-way set-associative 8-entry TAC with LRU replacement • IPC is improved 10% by reducing the branch prediction penalty from 3 to 2 cycles • Additional 15% IPC improvement by branch predictor • About 11500 gate (about 2.64% area) in Verilog HDL model • Thus, we can improve the performance of AE64000 by 25% with less than 3% cost

Branch Predictor Design for AE64000

Branch Predictor Design for AE64000

Presentation Transcript

Microbenchmarks and Mechanisms For Reverse Engineering Of Modern Branch Predictor Units

A 256 Kbits L-TAGE branch predictor

The O-GEHL branch predictor

Eclipse Predictor

A Penalty-Sensitive Branch Predictor

Global-Local Combined Branch History The Alternative Way to Improve TAGE Branch Predictor

Temporal Stream Branch Predictor (TS Predictor)

A 64 Kbytes ITTAGE indirect branch predictor

Bimode Cascading: Adaptive Rehashing for ITTAGE Indirect Branch Predictor

Branch Predictor Interface

Looking for limits in branch prediction with the GTL predictor

Exploring Efficient SMT Branch Predictor Design

Choice Predictor for Free

Design tradeoffs for the Alpha EV8 Conditional Branch Predictor

Microbenchmarks and Mechanisms for Reverse Engineering of Branch Predictor Structures

A 256 Kbits L-TAGE branch predictor

Branch Design

THE PREDICTOR

Microbenchmarks and Mechanisms For Reverse Engineering Of Modern Branch Predictor Units

Ovulation Predictor

Design tradeoffs for the Alpha EV8 Conditional Branch Predictor

The O-GEHL branch predictor