1 / 20

A 64 Kbytes ITTAGE indirect branch predictor

A 64 Kbytes ITTAGE indirect branch predictor. André Seznec INRIA/IRISA. Build on ITTAGE. ITTAGE: Introduced at the same time as TAGE (JILP 2006) Derived directly from the TAGE predictor : Target prediction instead of direction prediction.

harley
Download Presentation

A 64 Kbytes ITTAGE indirect branch predictor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A 64 Kbytes ITTAGE indirect branchpredictor André Seznec INRIA/IRISA

  2. Build on ITTAGE • ITTAGE: • Introducedat the same time as TAGE (JILP 2006) • Deriveddirectlyfrom the TAGE predictor: • Target predictioninstead of direction prediction

  3. ITTAGE: multiple tables, global history predictor The set of history lengths forms a geometric series Capture correlation on very long histories {0, 2, 4, 8, 16, 32, 64, 128} most of the storage for short history !! What is important:L(i)-L(i-1) is drastically increasing

  4. The ITTAGE predictor h[0:L1] pc h[0:L3] pc pc h[0:L2] pc 32 32 1 32 1 32 1 =? =? =? 32 32 prediction Tagless base Predictor

  5. Prediction computation • General case: • Longest matching component provides the prediction • Special case: • Many mispredictions on newly allocated entries: weak Ctr • Sometimes Altpred (slightly) more accuratethan Pred • Property dynamically monitored through a single 4-bit counter -2 % MPPKI

  6. A tagged table entry • Ctr: 2-bit hysteresis counter • U: 1-bit useful counter • Was the entry recently useful ? • Tag: partial tag • Target: the target Target Tag Ctr U 32 bits or someway to reconstructit

  7. Allocate entries on mispredictions • Allocateentries in longer historylength tables • On tables with U unset • Set Ctr to Weak and U to 0 • HUGE STORAGE BUDGET: • Up to 3 entries allocated in different tables • Fastwarming

  8. Managing the (U)seful bit • Setting whenavoids a misprediction • (Pred = target) & (Alt ≠ target) • Global reset when « difficulties » to allocate • Dynamically monitor if more failuresthansuccesses on allocations

  9. Most of the storagespace for targets • 32 bits per entry !! • More than 12K (PC,target) pairs on CLIENT05 • But only a maximum of 4038 differenttargets • Use 12 bit pointers + a 4K table

  10. Let us berealistic: leveragetargetlocality • All targets in atmost 90 256KB regions • Use a 128-entry region table: • Fully associative, 240 bytes • Saves 7 bits per ITTAGE entry • Would have saved 39 bits on a 64-bit architecture !!

  11. Target Tag Ctr U Region pointer Region offset

  12. The global history • Conventional global branchhistory • 10 bits for indirect jumps, 5 bits for calls • mixingtarget and PC -16 % MPPKI

  13. The global history (2) • Including all branches ? • Only indirect and calls: -2.5 % MPPKI • But no conclusion: • without 2 branches on INT05 and INT06 just the otherway

  14. + the other tricks (for TAGE) • Immediate Update Mimicker • Storage spaceinterleaving • Picking the best set of historylengths -1 % MPPKI

  15. The Immediate Update Mimicker • Issue: • Somemispredictions due to late updates at retirement • Immediate Update Mimicker: • Try to catch these cases

  16. The Immediate Update Mimicker Fetch P(rediction) T(able) A(ddress in the table) P T A P T A P T A P T A P T A P T A E T A P T A P T A P T A P T A P T A P T A P T A P T A P T A P T A P T A P T A P T A P T A P T A P T A P T A P T A E T A Misprediction Same table, same entry

  17. For the competition: interleaving h[0,L1] h[0,L1] Xbar Xbar =? =? =? prediction

  18. For the competition Guidedselection of the best set of historylengths: 4Kentries: 0, 4Kentries: 0, 10, 4Kentries: 16, 27, 44, 60, 96, 109, 219, 449, 2Kentries: 487, 714, 1313, 2146, 3881 Remember: 10 bits per indirect, 5 per call

  19. Whereis the limit ? • Lessthan 3 % MPPKI • Whydidyou not use the « 12-bit pointer » trick ? • Just winning 0.5 % MPPKI

  20. Summary • ITTAGE directlyderivedfrom TAGE • Historyshouldinclude (PC+target) for indirect and calls • Locality on targetscanbeleveraged • Marginal tricks not really worth

More Related