Merging Path, Global and Local Indexing in Perceptron Branch Prediction

Merging Path, Global and Local Indexing in Perceptron Branch Prediction David Tarjan 1

Published in: • An Ahead Pipelined Alloyed Perceptron with Single Cycle Access TimeD. Tarjan, K. Skadron and M. Stan Workshop on Complexity Effective Design (WCED), June 2004 • Merging path and gshare indexing in perceptron branch predictionD. Tarjan and K. Skadron ACM Transactions on Architecture and Code Optimization, 2(3), Sep. 2005 2

Why Yet Another Branch Predictor? • Single-thread performance growth stalling • Pipeline length still slowly increasing • Buffer sizes also increasing • No more “free” clock scaling • Power budget goes to more cores -> If we want more single-thread performance, we have to go for efficiency! 3

Outline • What is a perceptron? • Ahead-pipelining • Precomputing local sums • Results for ahead pipelined alloyed perceptron • Hashed Indexing • Results of a hashed perceptron • Conclusion 4

Main Contributions • Reduced latency of perceptron predictors to one cycle • Showed how to reduce number of weights/adders by N (for N:6-12) for a given history length • Reduced mispredictions by up to 27.2% over path-based perceptron 5

Global Perceptron 6

Path-based Perceptron 7

Main Problems in Branch Prediction: • Accuracy (larger tables, more logic) • Latency (smaller tables, less logic) • Multiple Branch/Trace/Stream/etc. per cycle Addressing these two points Tradeoff! 8

Normal pipelined perceptron 9

Ahead pipelined perceptron p_addr(x) = addr(x) + direction(x) 10

Impact of ahead-pipelining on accuracy 11

Precomputing a local history perceptron 12

Impact of adding local history 13

Hashed Perceptron: Motivation • Want longer history for accuracy • But that means more adders • Also need more bits per weight for very long history • With ahead-pipelining kind of have two bits of history for each weight… • But more ahead-pipelining means fewer address bits to select weight… We have been here before! 14

We want gshare for perceptrons! 15

Hashed Perceptron 16

Benefits? • We can reduce number of tables and adders by n, where n is the number of hist. bits per table • We can accurately predict linearly inseparable branches (two branches which have XOR pattern) 17

Comparison of misprediction rates 18

Performance Results 19

Related Work • O-GEHL: Optimized GEometric History Length Predictor [Seznec2004] • gDAC: global Divide And Conquer [Loh2005] • PWLB: Piecewise Linear Branch Predictor [Jimenez2004] • TAGE: TAgged GEometric History Branch Predictor [Seznec&Michaud2006] 20

Conclusion • Can make a perceptron predictor single cycle latency • Assigning multiple bits to a single weight helps for both accuracy and power • More accuracy is only good with low latency 21

Q & A 22

RET1 RET2 IFU1 IFU2 IFU3 DEC1 DEC2 RAT ROB DIS EX PREF DEC DEC EXEC WB P5 Microarchitecture P6 Microarchitecture TC NextIP TC Fetch Drive Alloc Rename Queue Schedule Dispatch Reg File Exec Flags Br Ck Drive NetBurst Microarchitecture (Willamette) ~30 stages ??? NetBurst Microarchitecture (Prescott) Why Yet Another Branch Predictor? (ca. 2003) Graphics from Prof. Hsien-Hsin Sean Lee presentation On the pentium pro/pentium 4 microarchitecture 23

Let’s start with the familiar predictors 24

Just to remind you 25

CBP Accuracy Comparison 26

Merging Path, Global and Local Indexing in Perceptron Branch Prediction

Merging Path, Global and Local Indexing in Perceptron Branch Prediction

Presentation Transcript

Branch Prediction

Dynamic Branch Prediction

Branch Prediction Logic

Branch Prediction

Dynamic Branch Prediction

Perceptron Branch Prediction with Separated T/NT Weight Tables

Branch Prediction

Dynamic Branch Prediction

Dynamic Branch Prediction

Branch Hazards and Static Branch Prediction Techniques

Branch prediction

Branch Prediction

Branch Prediction

Perceptron-based Global Confidence Estimation for Value Prediction

Branch Prediction

Branch prediction

Branch Prediction Techniques

Branch Prediction

Branch Prediction Logic