A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction Gabriel H. Loh College of Computing Georgia Tech

aren’t we done with branch predictors yet? • Branch predictors still important • Performance for large windows • ex. CPR[Akkary et al./MICRO’03]/CFP[Srinivasan et al./ASPLOS’04] • Power • better bpred reduces wrong-path instructions • Throughput • wrong-path insts steal resources from other threads in SMT/SOEMT PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

recent bpred research • “neural-inspired” predictors • perceptron, piecewise-linear, O-GEHL, … • very high accuracy • relatively high complexity • barrier to industrial adoption PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

outline • quick synopsis of neural techniques • gDAC predictor • idea • specifics • ahead-pipelining • results • why gDAC works PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

0 1 1 1 0 1 0 0 1 0 1 0 gshare Pattern History Table (PHT) taken • Records previous outcomes given a branch identifier (PC) and a context (BHR) • Different contexts may lead to different predictions for the same branch • Assumes correlation between context and the outcome Branch history register (BHR) hash NT foobar PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

gshare pros and cons • simple to implement! • variants exist in multiple real processors • not scalable for longer history lengths • # PHT entries grows exponentially • learning time increases • if only correlated to one previous branch, still need to train 2h PHT counters PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

w0 w1 w2 weights track correlation perceptron • explicitly locate the source(s) of correlation h2 h1 h0 h2 h1 h0 1 Perceptron Approach Table Based Approach ! h1 1 0 xi = hi ? 1 : -1 f(X) = (0*x0 – 1*x1 + 0*x2) ≥ 0 0 1 1 0 0 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

Updating the weights: If branch outcome agrees with hi, then increment wi If disagree, decrement wi perceptron predictor PC … BHR … Adder ≥0 Magnitude of weight reflects degree of correlation. No correlation makes wi0 Final Prediction • Downsides: • Latency (SRAM lookup, adder tree) • Few entries in table  aliasing • Linearly separable functions only PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

path-based neural predictor PC • Perceptron: All weights chosen by PC0 • PBNP: wi selected by PCi (ith oldest PC) • Naturally leads to pipelined access • Different indexing reduces aliasing … BHR … … … + + + + … • Downsides: • Latency (SRAM lookup, one adder) • Complexity (30-50 stage bpred pipe) • Linearly separable functions only + ≥0 Final Prediction PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

piecewise-linear predictor PC Compute m different linear functions in parallel Some linearly inseparable functions can be learned … BHR … … … + + + + + + + + • Downsides: • Latency (SRAM lookup, one • adder, one mux) • Complexity (m copies of 50+ • stage bpred pipe) + + + + + + + + … … … … PC + + + + Final Prediction ≥0 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

goal/scope • Neural predictors are very accurate • We want same level of performance • Neural predictors are complex • Large number of adders • Very deep pipelines We want to avoid adders We want to keep the pipe short Preferable to use PHTs only PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

Neural Predictor very long branch history idea very long branch history Neural Predictor (Google images “hot dog kobayashi” – 2004 World Record 53½ Hot Dogs) PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

Predictor Predictor Predictor Predictor Predictor very long branch history idea very long branch history Predictor Predictor Predictor Predictor Meta Make “digesting” a very long branch history easier by dividing up the responsibility! (random picture from Google images “hot dog eating”) PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

gshare-styled predictor BHR[1:s1] BHR[s1+1:s2] BHR[s2+1:s3] BHR[s3+1:s4] PC Meta PHT1 PHT2 PHT3 PHT4 Prediction unoptimized gDAC • global history Divide And Conquer Utilizes correlation from only a single history segment PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

fusion gDAC BHR[1:s1] BHR[s1+1:s2] BHR[s2+1:s3] BHR[s3+1:s4] PC PHT1 PHT2 PHT3 PHT4 Fusion Table Can combine correlations from multiple segments Prediction PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

gDAC BHR[1:s1] BHR[s1+1:s2] BHR[s2+1:s3] BHR[s3+1:s4] PC BM1 BM2 BM3 BM4 Shared Choice PHT Fusion Table Bi-Mode style predictor Better per-segment predictions lead to a better final prediction Prediction PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

Branch history from cycles t, t-1 and t-2 does not exist yet Cycle t-2 Row Decoder Cycle t-1 SRAM Array Access PC-1 PC Cycle t Prediction ahead-pipelined gDAC Segment 1 Segment 2 Segment 3 Initial Hashing, PHT Bank Select Cycle t-3 PC-3 Each PHT SRAM organized to output multiple counters (think “cache line”); use current PC to select one Use PC-1 for SRAM column MUX selection Branch history from cycles t, t-1 and t-2 now available PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

comment on ahead-pipelining • Branch predictors composed of only PHTs • simple SRAMs easily ahead-pipelined • Seznec showed AP of 2bcgskew [ISCA’02], fetch in general [ISCA’03] • Jiménez showed AP-like gshare.fast [HPCA’03] PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

simulation/configs • Standard stuff • SimpleScalar/Alpha (MASE), SPEC2k-INT, SimPoint • CPU Config similar to PWL study [Jiménez/ISCA’05] • gDAC vs. gshare, perceptron, PBNP, PWL • gDAC configs vary • 2-3 segments • history length of 21 @ 2KB to 86 @ 128KB • neural advantage: gDAC tables constrained to power-of-two entries, neural can use arbitrary sizes PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

misprediction rates 2KB: About as accurate as original perceptron Piecewise Linear predictor just does really well 8KB: Beats original perceptron 32KB: As accurate as path-based neural pred PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

performance Latency difference allows gDAC to even catch up with PWL in performance gDAC is less accurate than path-neural @ 16KB, but latency starting to matter Goal achieved: Neural-class performance PHT-only complexity As accurate as perceptron, but better latency  higher IPC PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

so it works, but why? • correlation locality • correlation redundancy • correlation recovery • use perceptron as vehicle of analysis – it explicitly assigns a correlation strength to each branch PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

correlation locality parser Distinct clusters/bands of correlation Segmenting (at the right places) should not disrupt clusters of correlation PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

correlation locality gcc PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

correlation redundancy Using only the correlation from a few branches yields almost as much info as using all branches Therefore the correlations detected in the other weights are redundant! PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

P1 P2 P3 fusion Prediction correlation recovery • cross-segment correlation may exist P1 P2 P3 M2,3 M1,(2,3) Fusion gDAC beats selection gDAC by 4% Selection-based Meta can only use correlation from one segment Fusion can (indirectly) use correlation from all segments Prediction PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

orthogonality • could use these ideas in other predictors • segmented history PPM predictor • segmented, geometric history lengths • some “segments” could use local history, prophet “future” history, or anything else • may be other ways to exploit the general phenomena • correlation locality, redundancy and recovery PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

summary • contributions • PHT-based long-history predictor • achieves goals of neural-accuracy, PHT complexity • ahead-pipelined organization • analysis of segmentation+fusion on correlation • Contact: • loh@cc.gatech.edu • http://www.cc.gatech.edu/~loh PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

BACKUP SLIDES PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

Power • Neural predictor update: lots of separate small tables; extra decoders, harder to bank • All of the adders • Timing critical for perceptron – power hungry • Not as bad for PBNP (use small RCAs) • PWL (multiplies # adders considerably) • Checkpointing overhead for PBNP, PWL • Need to store 30-50+ partial sums • Per branch! PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

Fetch Decode Rename Commit … Power Density/Thermals • gDAC: can break up tables between prediction bits and hysteresis bits (like EV8) • Neural must use all bits Similar for O-GEHL, PPM Physical separation reduces power density/thermals PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

linear (in)separability Linearly separable between segments Linearly separable only This does The best Linearly separable within segments Linearly inseparable PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

per-benchmark accuracy (128KB) PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

Presentation Transcript

Divide-and-Conquer Approach

Divide And Conquer

Divide And Conquer

Divide-and-Conquer

Divide-and-Conquer

Divide and Conquer

Divide and conquer

Divide and Conquer

Divide-and-Conquer

Divide and Conquer

Neural Methods for Dynamic Branch Prediction

Divide-and-Conquer

Divide-and-Conquer

Divide-and-Conquer

Divide and Conquer

Divide and Conquer

Divide-and-Conquer Approach

Neural Methods for Dynamic Branch Prediction

Divide-and-Conquer

Divide-and-Conquer

Divide and Conquer

Divide and Conquer