1 / 22

Dynamic Predication

Dynamic Predication. ACAL Group Seminar Alok Garg. What is Predicated Execution?. Conditional instruction Executed : if condition is true NOP: if condition is false Eliminate simple branches If(A==0) { S = T} Convert control dependencies into data dependencies. BNEZ R1, L

sani
Download Presentation

Dynamic Predication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Predication ACAL Group Seminar Alok Garg

  2. What is Predicated Execution? • Conditional instruction • Executed : if condition is true • NOP: if condition is false • Eliminate simple branches • If(A==0) { S = T} • Convert control dependencies into data dependencies BNEZ R1, L ADDU R2, R3, R0 L: CMOVZ R2, R3, R1

  3. Simple Example Normal Execution A [B D E] C D E Predicted Execution A [C[!p] B[p]] D E A T NT B C D Pipeline flush due to misprediction Conditional instructions E • Limitations of software predication: • If branch is NT 98% of time • Delayed execution of blocks B or C

  4. Limitations of Predication • ISA support • Predicate registers • Predicated instructions • Performance overhead • Instruction fetch from both paths • Can not execute predicated instructions until the predicate value is resolved • Ideal predication speedup - 16.4% • Only small subset of control-flow graph is covered • Compiler cannot if-convert Complex control-flow • Ideal predication for all conditional branches – 37.4%

  5. Motivation • Some branches are still very hard to predict with conventional branch predictors • Mispredictions lead to costly pipeline flushes • Performance • Energy • Predication is used to avoid pipeline flushes for those hard to predict branches

  6. Paper Covered • Dynamic Hammock Predication for Non-predicated Instruction Set Architecture. Artur Klauser, Todd Austin, Dirk Gruwald, and Brad Calder – Pact 1998 • Wish Branches: Combining Conditional Branching with Predication for Adaptive Predicated Execution. Hyesoon Kim, Onur Mutlu, Jared Stark, and Yale N. Patt – MICRO 2005, IEEE MICRO TOP PICKS 2006 • Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths.Hyesoon Kim, Jose A. Joao, Onur Mutlu, and Yale N. Patt – MICRO 2006

  7. Type of Control-flow graphs A A A B C B C B C G F D D E F G D E H E F I H Simple hammock Nested hammock Frequently hammock

  8. Type of Control-flow graphs L A A B C G B D E F C Loop Non-merging control flow

  9. Distribution of mispredicted branches • Simple + Nested : 16 % of all mispredictions • All except non-merging: 66 % of all mispredictions

  10. Dynamic Hammock Predication • Target firstlimitation of software predication • Get rid of ISA support required • Dynamic predication for simple hammock • 11% of all mispredictions • Compiler support to mark simple hammock boundaries • Predication decision • Dynamic decision • Static profile based

  11. Support for Dynamic Predication Fork Context • R1 := … • R2 := … • R3 := … • R4 := … • B - cc (i) Then Context cc is false • R1 := R1 + R2 • R3 := R1 x 2 • BR (k) Else Context cc is true • R2 := R1 – R2 • R3 := R2 x 2 Join Context • RA := R1 • RB := R2 • RC := R3 • RD := R4

  12. Support for Dynamic Predication fork fork then then else else R1 R1 a k R2 R2 l b R3 R3 c m R4 R4 d d Rename Table Rename Table Fork Context • R1.a := … • R2.b := … • R3.c := … • R4.d := … • PL.e f i g j Then Context cc is false • R1 := R1 + R2 • R3 := R1 x 2 • BR (k) • R1.f := R1.a + R2.b • R3.g := R1.f x 2 • Removed Else Context cc is true Predicate Value = 0 • R2 := R1 – R2 • R3 := R2 x 2 • R2.i := R1.a – R2.b • R3.j := R2.i x 2 Predicate Value = 1 Join Context • RA := R1 • RB := R2 • RC := R3 • RD := R4 • R1.k := PL.e : R1.a : R1.f • R2.l := PL.e : R2.i : R2.b • R3.m:= PL.e : R3.j : R3.g • RA.n := R1.k • RB.o := R2.l • RC.p := R3.m • RD.q := R4.d

  13. Wish Branches • Target second and third limitation of software predication • Dynamic decision based on confidence estimator • Improved coverage by predicating loops • Uses compiler generated predicated blocks • Add “wish” code for dynamic decision • Define how to include simple loops for predication

  14. Wish Jumps and Wish Joins Code Predicated Code Branch Code Wish jump/join code

  15. Wish Loops Code Normal Code Wish Loop Code

  16. Dynamic Number of Wish Branches Performance improvement: 10.7% over predicated code

  17. Dynamic Number of Wish Loops Performance improvement: 13.3% over predicated code

  18. Diverge-Merge Processor (DMP) • Target all 3 limitations of software predication • Dynamic Predication - Little compiler support • Dynamic decision based on confidence estimation • Only on frequently executed control-flow paths • Software support • Compiler mark all diverge and merge points • Hardware support – similar to Dynamic Hammock predication • Enters predication mode at diverge point • Predicate only frequently executed paths

  19. Frequently Executed Control-Flow Paths • Dynamically predicate: Blocks B C E • Reduces predication overhead • Improve predication coverage by including complex control flow graphs

  20. Comparison of Various Predication Schemes A A A L A B C B C B C A B C G F D D E F G D G B D E F E H E C Loop F I H Non-merging control flow Simple hammock Nested hammock Frequently hammock

  21. Performance • 19.3% average performance improvement • 38% reduction in pipeline flushes • Consumes 9% less energy

  22. Conclusion • Most of the hard to predict branches (66%) have convergence point • Dynamic predication is more effective than software predication in terms of: • Number of miss-predicted branches covered • Accuracy of coverage • Effectively reduce large number of pipeline flushes

More Related