Clustered Indexing for Conditional Branch Predictors

Clustered Indexingfor Conditional Branch Predictors Veerle Desmet Ghent University Belgium

Conditional Branches for (i=0; i<50; i++) { /* a loop... */ } /* next statements */ How frequent do conditional branches occur? if (i > 0) /* something */ else /* something else */ 1/8

Program Execution • Fetch = take next instruction • Decode = analyze type and read operands • Execute • Write Back = write result Fetch Decode Execute Write Back R1=R2+R3 addition 4 3 computation R1 contains 7

R5=R2+1 R1=R2+R3 R4=R3-1 R5=R2+1 R1=R2+R3 R7=2*R1 R4=R3-1 R5=R2+1 R1=R2+R3 R5=R6 R7=2*R1 R4=R3-1 R5=R2+1 R1>0 R5=R6 R7=2*R1 R4=R3-1 Pipelined architectures Parallel versus sequential: • Constant flow of instructions possible • Faster applications • Limitation due to conditional branches Fetch Decode Execute Write Back R1=R2+R3

R1=R2+R3 R5=R6 R5=R2+1 if R1>0 R7=2*R1 then R2=R2-1 R7=0 else ? R5=R2+1 R4=R3-1 R7=2*R1 R5=R6 if R1>0 R1=R2+R3 R5=R2+1 R5=R6 R1=R2+R3 if R1>0 R5=R2+1 R5=R6 ? ? ifR1>0 R5=R2+1 R7=2*R1 if R1>0 Problem: Branches • Branches introduce bubbles • Affects pipeline throughput Fetch Decode Execute Write Back

R1=R2+R3 R5=R6 R5=R2+1 if R1>0 R7=2*R1 then R2=R2-1 R7=0 else R4=R3-1 R7=2*R1 R5=R6 if R1>0 R1=R2+R3 R5=R2+1 R5=R2+1 R5=R6 R1=R2+R3 R7=2*R1 if R1>0 R5=R2+1 R5=R6 R2=R2-1 R7=2*R1 ifR1>0 R5=R2+1 Solution: Prediction • Fetch those instructions that are likely to be executed Fetch Decode Execute Write Back correct prediction = gain misprediction = penalty

Branch predictor Nowaday’s Architecture functional unit functional unit instruction cache fetch decode register rename dispatch instruction window functional unit register file functional unit re- order logic IPC

Predict outcome of condition e.g. if or else based on unique branch address Update prediction table Bimodal Branch Predictor prediction table Branch address k

Global History Branch Predictor prediction table • Predict outcome of condition • e.g. for loop • based on global history • 111101111011110 • Update prediction table and global history Global history k

Gshare Branch Predictor prediction table [McFarling] Global history XOR Branch address Original index k

Misprediction rate: gshare SPEC INT 2000 25 20 15 misprediction rate 10 5 better 0 10 100 1000 10000 100000 1000000 predictor size (bytes)

Aliasing prediction table • Resource limitations: • 8 entries, index = 3 bits • index 101 • Two different branches using the same prediction information 3 bit index A Index=101 B Index=101

50 destructive 45 40 constructive 35 neutral 30 25 alias rate (%) 20 15 10 5 0 16 32 64 256 512 128 1024 2048 4096 8192 16384 26214 52428 32768 13107 65536 predictor size (bytes) Aliasing SPEC INT 2000

ClusteredIndexingfor Conditional Branch Predictors Veerle Desmet Ghent University Belgium

Basic Observations • Branches with similar behavior can share prediction information • 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 1 • 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 1 • Branches can use same table entry, e.g. • 1 1 1 1 0 0 0 0 • 1 1 1 1 0 1 0 time

Time Varying Behavior A: B: C: D: 1 1 1 10 0 00 1 11 10 1 0 1 1 1 11 000 0 1 11 10 1 0 1 1 1 11 0 0 10 0 1 11 10 1 0 phase phase phase phase 100% 0% 100% 50% 100%0% 100% 60% 100%25% 0% NE NE NE 100% 33% A: B: C: D: NE = not executed

Each branch represents a point in N-dim space Clusters formed by k-means algorithm Branch Clustering 100% 0% 100% 50% 100%0% 100% 60% 100%25% 0% NE NE NE 100% 33% A: B: C: D:

X X X X 1. initial centers 2. calculate nearest center X X X X X X 4. Restart with new centers 3. redefine centers k-Means Cluster Algorithm

X X X X X X Stable solution k-Means Cluster Algorithm X X 1. initial centers 2. calculate nearest centers 3. redefine centers

X X X X X Stable solution with k=2 Stable solution with k=3 Determining k of k-Means k is chosen by BIC-score (Bayesian Information Criterion) • Tradeoff between k and goodness of a clustering best?

SPEC INT 2000 from 8 to 33 clusters mcf: 8 gcc, parser: 33 Each branch belongs to exactly one cluster Branch Clustering 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: Cluster Cluster Cluster Cluster

Subtables prediction table • Example • 8 entries, index = 3 bits • 4 clusters, 2 bits • Original index 101 Cluster Index = 1 3

Subtables prediction table • Example • 8 entries, index = 3 bits • 4 clusters, 2 bits • Original index 101 • 3 to 6 bits for cluster [SPECint2000] • can be used in every predictor scheme Cluster Index = 1 3

25 bimodal original bimodal clustered 20 15 misprediction rate 10 5 0 10 100 1000 10000 100000 1000000 predictor size (bytes) Subtables for Bimodal prediction table Cluster Branch addr

25 gshare original gshare clustered 20 15 misprediction rate 10 5 0 10 100 1000 10000 100000 1000000 predictor size (bytes) Subtables for Gshare Global history prediction table Cluster Branch addr 19% better for SMALL predictors

Why Clustered Indexing Works • Subtabling • Uses smaller predictors • More aliasing expected… but • More constructive aliasing

Hashing: Alternative to Subtables prediction table • Keeps original global history length Global history Branch addr Cluster Gshare ix index

Hashing for Gshare 25 gshare original 20 gshare clustered: subtables gshare clustered: hashed 15 5% better for LARGE predictors misprediction rate 10 5 7,5 gshare original 7 0 gshare clustered: subtables 10 100 1000 10000 100000 1000000 6,5 predictor size (bytes) gshare clustered: hashed 6 5,5 misprediction rate 5 4,5 4 3,5 1000 10000 100000 1000000 predictor size (bytes)

Self Profile-Based Clustering A: B: C: D: • Limit study • Identified clusters optimal for given execution 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% Cluster Cluster Cluster Cluster

additional cluster for unseen branches Cluster Cross Profile-Based Clustering A: B: C: D: 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% Cluster SELF Cluster Cluster Cluster SPEC-train inputs A: B: C: D: E: 90% 10% 100% 60% NE NE NENE 100% 25% NE NE NE NE 100% 33% 0% 0% 10% 20% Cluster OK Cluster Cluster Cluster

25 25 bimodal original gshare original bimodal self clustered gshare self clustered 20 20 bimodal cross clustered gshare cross clustered 15 15 misprediction rate misprediction rate 10 10 5 5 0 0 10 100 1000 10000 100000 1000000 10 100 1000 10000 100000 1000000 predictor size (bytes) predictor size (bytes) 7,5 gshare original 7 gshare self clustered: subtables gshare self clustered: hashed 6,5 gshare cross clustered: subtables gshare cross clustered: hashed 6 5,5 misprediction rate 5 4,5 4 3,5 1000 10000 100000 1000000 predictor size (bytes) Cross Profile-Based Clustering cross clustered still good GSHARE @ small budgets: subtables 12.3% less mispredictions (19% self clustered) @ large budgets: hashing 3% better (5% self clustered)

Conclusion • Small branch predictors suffer from aliasing • frequently destructive • Exploit constructive aliasing • by clustering branches • Implementation • subtables (can be used in all branch prediction schemes) • hashing (specific for gshare) • Gshare misprediction rate @ 1KiB: reduced by 19% (self), 12.3% (cross) @ 256KiB: reduced by 5% (self), 3% (cross)

Questions?

The End

Clustered Indexing for Conditional Branch Predictors

Clustered Indexing for Conditional Branch Predictors

Presentation Transcript

Estimation techniques for clustered hierarchical data

Trees for spatial indexing

Constructive Computer Architecture: Branch Prediction: Direction Predictors Arvind

Predicting Conditional Branches With Fusion-Based Hybrid Predictors

Design tradeoffs for the Alpha EV8 Conditional Branch Predictor

Control Mechanisms for the clustered OBS

Traditional Database Indexing Techniques for Video Database Indexing

Clustered Computing

Clustered Systems for Massive Parallelism

TAGE-SC-L Branch Predictors

Indexing Structures for Files

Seasonal Predictors

Rank-Sum Tests for Clustered Data

Static Conditional Branch Prediction

Clustered Planarity = Flat Clustered Planarity

Design tradeoffs for the Alpha EV8 Conditional Branch Predictor

TAGE-SC-L Branch Predictors

Efficient Interconnects for Clustered Microarchitectures

Clustered Planarity = Flat Clustered Planarity

MICRO-CLUSTERED WATER