1 / 33

A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction. Gabriel H. Loh College of Computing Georgia Tech. aren’t we done with branch predictors yet?. Branch predictors still important Performance for large windows

teo
Download Presentation

A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction Gabriel H. Loh College of Computing Georgia Tech

  2. aren’t we done with branch predictors yet? • Branch predictors still important • Performance for large windows • ex. CPR[Akkary et al./MICRO’03]/CFP[Srinivasan et al./ASPLOS’04] • Power • better bpred reduces wrong-path instructions • Throughput • wrong-path insts steal resources from other threads in SMT/SOEMT PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  3. recent bpred research • “neural-inspired” predictors • perceptron, piecewise-linear, O-GEHL, … • very high accuracy • relatively high complexity • barrier to industrial adoption PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  4. outline • quick synopsis of neural techniques • gDAC predictor • idea • specifics • ahead-pipelining • results • why gDAC works PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  5. 0 1 1 1 0 1 0 0 1 0 1 0 gshare Pattern History Table (PHT) taken • Records previous outcomes given a branch identifier (PC) and a context (BHR) • Different contexts may lead to different predictions for the same branch • Assumes correlation between context and the outcome Branch history register (BHR) hash NT foobar PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  6. gshare pros and cons • simple to implement! • variants exist in multiple real processors • not scalable for longer history lengths • # PHT entries grows exponentially • learning time increases • if only correlated to one previous branch, still need to train 2h PHT counters PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  7. w0 w1 w2 weights track correlation perceptron • explicitly locate the source(s) of correlation h2 h1 h0 h2 h1 h0 1 Perceptron Approach Table Based Approach ! h1 1 0 xi = hi ? 1 : -1 f(X) = (0*x0 – 1*x1 + 0*x2) ≥ 0 0 1 1 0 0 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  8. Updating the weights: If branch outcome agrees with hi, then increment wi If disagree, decrement wi perceptron predictor PC … BHR … Adder ≥0 Magnitude of weight reflects degree of correlation. No correlation makes wi0 Final Prediction • Downsides: • Latency (SRAM lookup, adder tree) • Few entries in table  aliasing • Linearly separable functions only PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  9. path-based neural predictor PC • Perceptron: All weights chosen by PC0 • PBNP: wi selected by PCi (ith oldest PC) • Naturally leads to pipelined access • Different indexing reduces aliasing … BHR … … … + + + + … • Downsides: • Latency (SRAM lookup, one adder) • Complexity (30-50 stage bpred pipe) • Linearly separable functions only + ≥0 Final Prediction PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  10. piecewise-linear predictor PC Compute m different linear functions in parallel Some linearly inseparable functions can be learned … BHR … … … + + + + + + + + • Downsides: • Latency (SRAM lookup, one • adder, one mux) • Complexity (m copies of 50+ • stage bpred pipe) + + + + + + + + … … … … PC + + + + Final Prediction ≥0 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  11. goal/scope • Neural predictors are very accurate • We want same level of performance • Neural predictors are complex • Large number of adders • Very deep pipelines We want to avoid adders We want to keep the pipe short Preferable to use PHTs only PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  12. Neural Predictor very long branch history idea very long branch history Neural Predictor (Google images “hot dog kobayashi” – 2004 World Record 53½ Hot Dogs) PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  13. Predictor Predictor Predictor Predictor Predictor very long branch history idea very long branch history Predictor Predictor Predictor Predictor Meta Make “digesting” a very long branch history easier by dividing up the responsibility! (random picture from Google images “hot dog eating”) PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  14. gshare-styled predictor BHR[1:s1] BHR[s1+1:s2] BHR[s2+1:s3] BHR[s3+1:s4] PC Meta PHT1 PHT2 PHT3 PHT4 Prediction unoptimized gDAC • global history Divide And Conquer Utilizes correlation from only a single history segment PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  15. fusion gDAC BHR[1:s1] BHR[s1+1:s2] BHR[s2+1:s3] BHR[s3+1:s4] PC PHT1 PHT2 PHT3 PHT4 Fusion Table Can combine correlations from multiple segments Prediction PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  16. gDAC BHR[1:s1] BHR[s1+1:s2] BHR[s2+1:s3] BHR[s3+1:s4] PC BM1 BM2 BM3 BM4 Shared Choice PHT Fusion Table Bi-Mode style predictor Better per-segment predictions lead to a better final prediction Prediction PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  17. Branch history from cycles t, t-1 and t-2 does not exist yet Cycle t-2 Row Decoder Cycle t-1 SRAM Array Access PC-1 PC Cycle t Prediction ahead-pipelined gDAC Segment 1 Segment 2 Segment 3 Initial Hashing, PHT Bank Select Cycle t-3 PC-3 Each PHT SRAM organized to output multiple counters (think “cache line”); use current PC to select one Use PC-1 for SRAM column MUX selection Branch history from cycles t, t-1 and t-2 now available PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  18. comment on ahead-pipelining • Branch predictors composed of only PHTs • simple SRAMs easily ahead-pipelined • Seznec showed AP of 2bcgskew [ISCA’02], fetch in general [ISCA’03] • Jiménez showed AP-like gshare.fast [HPCA’03] PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  19. simulation/configs • Standard stuff • SimpleScalar/Alpha (MASE), SPEC2k-INT, SimPoint • CPU Config similar to PWL study [Jiménez/ISCA’05] • gDAC vs. gshare, perceptron, PBNP, PWL • gDAC configs vary • 2-3 segments • history length of 21 @ 2KB to 86 @ 128KB • neural advantage: gDAC tables constrained to power-of-two entries, neural can use arbitrary sizes PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  20. misprediction rates 2KB: About as accurate as original perceptron Piecewise Linear predictor just does really well 8KB: Beats original perceptron 32KB: As accurate as path-based neural pred PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  21. performance Latency difference allows gDAC to even catch up with PWL in performance gDAC is less accurate than path-neural @ 16KB, but latency starting to matter Goal achieved: Neural-class performance PHT-only complexity As accurate as perceptron, but better latency  higher IPC PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  22. so it works, but why? • correlation locality • correlation redundancy • correlation recovery • use perceptron as vehicle of analysis – it explicitly assigns a correlation strength to each branch PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  23. correlation locality parser Distinct clusters/bands of correlation Segmenting (at the right places) should not disrupt clusters of correlation PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  24. correlation locality gcc PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  25. correlation redundancy Using only the correlation from a few branches yields almost as much info as using all branches Therefore the correlations detected in the other weights are redundant! PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  26. P1 P2 P3 fusion Prediction correlation recovery • cross-segment correlation may exist P1 P2 P3 M2,3 M1,(2,3) Fusion gDAC beats selection gDAC by 4% Selection-based Meta can only use correlation from one segment Fusion can (indirectly) use correlation from all segments Prediction PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  27. orthogonality • could use these ideas in other predictors • segmented history PPM predictor • segmented, geometric history lengths • some “segments” could use local history, prophet “future” history, or anything else • may be other ways to exploit the general phenomena • correlation locality, redundancy and recovery PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  28. summary • contributions • PHT-based long-history predictor • achieves goals of neural-accuracy, PHT complexity • ahead-pipelined organization • analysis of segmentation+fusion on correlation • Contact: • loh@cc.gatech.edu • http://www.cc.gatech.edu/~loh PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  29. BACKUP SLIDES PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  30. Power • Neural predictor update: lots of separate small tables; extra decoders, harder to bank • All of the adders • Timing critical for perceptron – power hungry • Not as bad for PBNP (use small RCAs) • PWL (multiplies # adders considerably) • Checkpointing overhead for PBNP, PWL • Need to store 30-50+ partial sums • Per branch! PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  31. Fetch Decode Rename Commit … Power Density/Thermals • gDAC: can break up tables between prediction bits and hysteresis bits (like EV8) • Neural must use all bits Similar for O-GEHL, PPM Physical separation reduces power density/thermals PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  32. linear (in)separability Linearly separable between segments Linearly separable only This does The best Linearly separable within segments Linearly inseparable PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

  33. per-benchmark accuracy (128KB) PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

More Related