Dynamic Predication

Dynamic Predication ACAL Group Seminar Alok Garg

What is Predicated Execution? • Conditional instruction • Executed : if condition is true • NOP: if condition is false • Eliminate simple branches • If(A==0) { S = T} • Convert control dependencies into data dependencies BNEZ R1, L ADDU R2, R3, R0 L: CMOVZ R2, R3, R1

Simple Example Normal Execution A [B D E] C D E Predicted Execution A [C[!p] B[p]] D E A T NT B C D Pipeline flush due to misprediction Conditional instructions E • Limitations of software predication: • If branch is NT 98% of time • Delayed execution of blocks B or C

Limitations of Predication • ISA support • Predicate registers • Predicated instructions • Performance overhead • Instruction fetch from both paths • Can not execute predicated instructions until the predicate value is resolved • Ideal predication speedup - 16.4% • Only small subset of control-flow graph is covered • Compiler cannot if-convert Complex control-flow • Ideal predication for all conditional branches – 37.4%

Motivation • Some branches are still very hard to predict with conventional branch predictors • Mispredictions lead to costly pipeline flushes • Performance • Energy • Predication is used to avoid pipeline flushes for those hard to predict branches

Paper Covered • Dynamic Hammock Predication for Non-predicated Instruction Set Architecture. Artur Klauser, Todd Austin, Dirk Gruwald, and Brad Calder – Pact 1998 • Wish Branches: Combining Conditional Branching with Predication for Adaptive Predicated Execution. Hyesoon Kim, Onur Mutlu, Jared Stark, and Yale N. Patt – MICRO 2005, IEEE MICRO TOP PICKS 2006 • Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths.Hyesoon Kim, Jose A. Joao, Onur Mutlu, and Yale N. Patt – MICRO 2006

Type of Control-flow graphs A A A B C B C B C G F D D E F G D E H E F I H Simple hammock Nested hammock Frequently hammock

Type of Control-flow graphs L A A B C G B D E F C Loop Non-merging control flow

Distribution of mispredicted branches • Simple + Nested : 16 % of all mispredictions • All except non-merging: 66 % of all mispredictions

Dynamic Hammock Predication • Target firstlimitation of software predication • Get rid of ISA support required • Dynamic predication for simple hammock • 11% of all mispredictions • Compiler support to mark simple hammock boundaries • Predication decision • Dynamic decision • Static profile based

Support for Dynamic Predication Fork Context • R1 := … • R2 := … • R3 := … • R4 := … • B - cc (i) Then Context cc is false • R1 := R1 + R2 • R3 := R1 x 2 • BR (k) Else Context cc is true • R2 := R1 – R2 • R3 := R2 x 2 Join Context • RA := R1 • RB := R2 • RC := R3 • RD := R4

Support for Dynamic Predication fork fork then then else else R1 R1 a k R2 R2 l b R3 R3 c m R4 R4 d d Rename Table Rename Table Fork Context • R1.a := … • R2.b := … • R3.c := … • R4.d := … • PL.e f i g j Then Context cc is false • R1 := R1 + R2 • R3 := R1 x 2 • BR (k) • R1.f := R1.a + R2.b • R3.g := R1.f x 2 • Removed Else Context cc is true Predicate Value = 0 • R2 := R1 – R2 • R3 := R2 x 2 • R2.i := R1.a – R2.b • R3.j := R2.i x 2 Predicate Value = 1 Join Context • RA := R1 • RB := R2 • RC := R3 • RD := R4 • R1.k := PL.e : R1.a : R1.f • R2.l := PL.e : R2.i : R2.b • R3.m:= PL.e : R3.j : R3.g • RA.n := R1.k • RB.o := R2.l • RC.p := R3.m • RD.q := R4.d

Wish Branches • Target second and third limitation of software predication • Dynamic decision based on confidence estimator • Improved coverage by predicating loops • Uses compiler generated predicated blocks • Add “wish” code for dynamic decision • Define how to include simple loops for predication

Wish Jumps and Wish Joins Code Predicated Code Branch Code Wish jump/join code

Wish Loops Code Normal Code Wish Loop Code

Dynamic Number of Wish Branches Performance improvement: 10.7% over predicated code

Dynamic Number of Wish Loops Performance improvement: 13.3% over predicated code

Diverge-Merge Processor (DMP) • Target all 3 limitations of software predication • Dynamic Predication - Little compiler support • Dynamic decision based on confidence estimation • Only on frequently executed control-flow paths • Software support • Compiler mark all diverge and merge points • Hardware support – similar to Dynamic Hammock predication • Enters predication mode at diverge point • Predicate only frequently executed paths

Frequently Executed Control-Flow Paths • Dynamically predicate: Blocks B C E • Reduces predication overhead • Improve predication coverage by including complex control flow graphs

Comparison of Various Predication Schemes A A A L A B C B C B C A B C G F D D E F G D G B D E F E H E C Loop F I H Non-merging control flow Simple hammock Nested hammock Frequently hammock

Performance • 19.3% average performance improvement • 38% reduction in pipeline flushes • Consumes 9% less energy

Conclusion • Most of the hard to predict branches (66%) have convergence point • Dynamic predication is more effective than software predication in terms of: • Number of miss-predicted branches covered • Accuracy of coverage • Effectively reduce large number of pipeline flushes

Dynamic Predication