A 256 Kbits L-TAGE branch predictor

A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC

Directly derived from: A case for (partially) tagged branch predictors, A. Seznec and P. Michaud JILP Feb. 2006 + Tricks: Loop predictor Kernel/user histories

TAGE: TAgged GEometric history length predictors The genesis

Back around 2003 • 2bcgskew was state-of-the-art, but: • but was lagging behind neural inspired predictors on a few benchmarks • Just wanted to get best of both behaviors and maintain: • Reasonable implementation cost: • Use only global history • Medium number of tables • In-time response

The basis : A Multiple length global history predictor TO T1 T2 ? L(0) T3 L(1) L(2) T4 L(3) L(4)

GEometric History Length predictor The set of history lengths forms a geometric series Capture correlation on very long histories {0, 2, 4, 8, 16, 32, 64, 128} most of the storage for short history !! What is important:L(i)-L(i-1) is drastically increasing

Combining multiple predictions ? • Classical solution: • Use of a meta predictor “wasting” storage !?! chosing among 5 or 10 predictions ?? • Neural inspired predictors, Jimenez and Lin 2001 • Use an adder tree instead of a meta-predictor • Partial matching • Use tagged tables and the longest matching history Chen et al 96, Michaud 2005

TO T1 T2 ∑ T3 L(1) L(2) T4 L(3) L(4) CBP-1 (2004): OGEHL Final computation through a sum L(0) Prediction=Sign 12 components 3.670 misp/KI

h[0:L1] pc pc pc h[0:L2] pc h[0:L3] tag tag tag ctr ctr ctr u u u 1 1 1 1 1 1 1 =? =? =? 1 hash hash hash hash hash hash 1 prediction TAGEGeometric history length + PPM-like + optimized update policy Tagless base predictor

Miss Hit Pred =? =? 1 1 1 1 1 1 1 =? 1 Hit 1 Altpred

Prediction computation • General case: • Longest matching component provides the prediction • Special case: • Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accuratethan Pred • Property dynamically monitored through a single 4-bit counter

TAGE update policy • General principle: Minimize the footprint of the prediction. • Just update the longest history matching component and allocate at most one entry on mispredictions

U Tag Ctr A tagged table entry • Ctr: 3-bit prediction counter • U: 2-bit useful counter • Was the entry recently useful ? • Tag: partial tag

Updating the U counter • If (Altpred ≠ Pred) then • Pred = taken : U= U + 1 • Pred ≠ taken : U = U - 1 • Graceful aging: • Periodic shift of all U counters • implemented through the reset of a single bit

Allocating a new entry on a misprediction • Find a single “useless” entry with a longer history: • Priviledge the smallest possible history • To minimize footprint • But not too much • To avoid ping-pong phenomena • Initialize Ctr as weak and U as zero

Improve the global history • Address + conditional branch history: • path confusion on short histories  • Address + path: • Direct hashing leads to path confusion  • Represent all branches in branch history • Use also path history ( 1 bit per branch, limited to 16 bits)

Design tradeoff for CBP2 (1) • 13 components: • Bring the best accuracy on distributed traces • 8 components not very far ! • History length: • Min=4 , Max = 640 Could use any Min in [2,6] and any Max in [300, 2000]

Design tradeoff for CBP2 (2) • Tag width tradeoff: • (destructive) false match is better tolerated on shorter history • 7 bits on T1 to 15 bits on T12 • Tuning the number of table entries: • Smaller number for very long histories • Smaller number for very short histories

Adding a loop predictor • The loop predictor captures the number of iterations of a loop • When successively encounters 4 times the same number of iterations, the loop predictor provides the prediction. • Advantages: • Very reliable • Small storage budget: 256 52-bit entries • Complexity ? • Might be difficult to manage speculative iteration numbers on deep pipelines

Using a kernel history and a user history • Traces mix user and kernel activities: • Kernel activity after exception • Global history pollution • Solution: use two separate global histories • User history is updated only in user mode • Kernel history is updated in both modes

L-TAGE submission accuracy (distributed traces) 3.314 misp/KI

Reducing L-TAGE complexity • Included 241,5 Kbits TAGE predictor: • 3.368 misp/KI • Loop predictor beneficial only on gzip: Might not be worth the extra complexity

Using less tables • 8 components 256 Kbits TAGE predictor: • 3.446 misp/KI

TAGE prediction computation time ? • 3 successive steps: • Index computation • Table read • Partial match + multiplexor • Does not fit on a single cycle: • But can be ahead pipelined !

Ahead pipelining a global history branch predictor (principle) • Initiate branch prediction X+1 cycles in advance to provide the prediction in time • Use information available: • X-block ahead instruction address • X-block ahead history • To ensure accuracy: • Use intermediate path information

Practice C A B bc Ahead pipelined TAGE: 4// prediction computations Ha A

3-branch ahead pipelined 8 component 256 Kbits TAGE 3.552 misp/KI

A final case for the Geometric History Length predictors • delivers state-of-the-art accuracy • uses only global information: • Very long history: 300+ bits !! • can be ahead pipelined • many effective design points • OGEHL or TAGE  • Nb of tables, history lengths

The End 

A 256 Kbits L-TAGE branch predictor

A 256 Kbits L-TAGE branch predictor

Presentation Transcript

The O-GEHL branch predictor

T-BAG: Bootstrap Aggregating the TAGE Predictor

A Penalty-Sensitive Branch Predictor

Global-Local Combined Branch History The Alternative Way to Improve TAGE Branch Predictor

Temporal Stream Branch Predictor (TS Predictor)

A 64 Kbytes ITTAGE indirect branch predictor

Bimode Cascading: Adaptive Rehashing for ITTAGE Indirect Branch Predictor

Branch Predictor Interface

Exploring Efficient SMT Branch Predictor Design

Design tradeoffs for the Alpha EV8 Conditional Branch Predictor

A 256 Kbits L-TAGE branch predictor

Branch Predictor Design for AE64000

TAGE-SC-L Branch Predictors

Storage Free Confidence Estimator for the TAGE predictor

TAGE-SC-L Branch Predictors

TAGE-SC-L Again MTAGE-SC

A Weather Predictor

The O-GEHL branch predictor