1 / 18

Exploring Correlation for Indirect Branch Prediction

Nikunj Bhansali, Chintan Panirwala, Huiyang Zhou Department of Electrical and Computer Engineering North Carolina State University. Exploring Correlation for Indirect Branch Prediction. Baseline: IITAGE Indirect Branch Predictor [A. Seznec and P. Michaud, JILP 2006]

eamon
Download Presentation

Exploring Correlation for Indirect Branch Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nikunj Bhansali, Chintan Panirwala, Huiyang Zhou Department of Electrical and Computer Engineering North Carolina State University Exploring Correlation for Indirect Branch Prediction

  2. Baseline: IITAGE Indirect Branch Predictor [A. Seznec and P. Michaud, JILP 2006] • A PPM-based predictor contains multiple Markov predictors with each capturing different history length and the one with the longest match will be used to make prediction.

  3. Our Main Idea: • Longest history length vs. adaptive history lengths. • Address-target correlation.

  4. T1 T2 T3 Tn Tag u Target Tag u Target Tag u Alt Target Alt Tag u Target Alt T1_Match T2_Match … T1,2_Match T3_Match … T1,n-1_Match Tn_Match T1_Match HBT hit T2_Match … hlen Tn_Match Target Prediction Predictor Structure – Main Predictor

  5. ITTAGE as the baseline predictor (no T0) • Two ways to adaptively select the proper table (or history length) • 1. Alt bit in each entry (except T1) • 2. A separate table for hard-to-predict branches tag u alt target Main Predictor at Fetch stage

  6. Alt = 0, target from the current entry is preferred for the prediction. • Alt = 1, a table with shorter history is to be used to make the final prediction. • No alt bit for the table T1. • Initially alt field is set to zero. • Update mechanism: • If table with the longest match fails to make correct prediction while another table does, the alt field will be set for those entries with longer history lengths. Using Alt bits to select a table

  7. A cache like set associative structure with entry containing a tag, a misprediction counter (mc) and a history length (hlen). • HBT updated based on the prediction provided by longest history • mc field is used for replacement to allow hard to predict branches to be captured by HBT. • hlen is used to select the hlenthlongest history. tag mc hlen Hard-to-predict Branch Table (HBT)

  8. For example, if hlen = 2 and T2, T4 and T5 have tag matches and their corresponding alt fields are false then T2 will be selected for prediction. • The main predictor provides prediction at fetch stage. • The main predictor is updated at retire stage of an indirect branch. Hard to predict Branch table (HBT)

  9. Correlation between producer load address and consumer branch target, e. g., Load R19 = Mem [R3] //Address: 0x608481000x60846ec8 Br R19 //Target: 0x60751a640x607691c9 • Producer load accesses two addresses with each address providing a different branch target. • As long as data structures at these addresses do not change frequently, they are sufficient to predict branch target of consumer indirect branch. Auxiliary Predictor at AGEN stage

  10. Hashed load address tag <addr,target> <addr,target> Br pc Auxiliary Predictor Design

  11. Address Target Correlation (ATC) is captured using Address Target Table (ATT) . • Accessed at agenstage of load instruction. • PC of indirect branch used for tag match. • Hashed load address is used to find matching address-target pair. • Updated at the EXE stage of an indirect branch • LRU replacement policy. • Reduces misprediction penalty in case the prediction differs from the one provided at fetch stage. Hashed load address tag <addr,target> <addr,target> Br pc Auxiliary Predictor Design

  12. Tagged table entry • U ctr: 2 bits • Target: 32 bits • Alt: 1 bit (except T1) • Tag: partial tag • HBT (1,216 bits) • 32 entries • Tag: 32 bits • mc: 2 bits • hlen: 4 bits • ATT (11,882 bits) • 26 entries • Tag: 32 bits • Lru: 5 bits • <target,address> : <32,10> bits Storage Cost (1/2)

  13. Global history – 640 * 2 bits • Path history – 16 bits • Other counters – 39 bits • Total – 64.97 KB Storage Cost (2/2)

  14. Overall performance improvements (ATT 11,882 bits)– 15.6% • Performance improvements with small ATT (1,624 bits) – 14.8% Experimental Results

  15. 1. Other contestants are doing superb! • 2. Our baseline ITTAGE is not well tuned. The code and the predictor structure is modified based on L-TAGE Discussion: Why we may not win

  16. Our main ideas, adaptive history length and address-target correlation, can further improve well-tuned predictors. Discussion: Why we can win

  17. Although control flow history carries correlation to targets, the strength of correlation may either increase or decrease for different indirect branches when we increase the history length. • There exists strong correlation between producer load addresses and consumer branch targets. Conclusions

  18. Thank You

More Related