220 likes | 350 Views
Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution. Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt. The University of Texas at Austin *Oregon Microarchitecture Lab Electrical and Computer Engineering Intel Corporation. Talk Outline.
E N D
Wish BranchesCombining Conditional Branching and Predication for Adaptive Predicated Execution Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt The University of Texas at Austin *Oregon Microarchitecture Lab Electrical and Computer Engineering Intel Corporation
Talk Outline • Problem • Wish Branches • Experimental Methodology • Results • Conclusion
(normal branch code) A A T N if (cond) { b = 0; } else { b = 1; } B C B C D D A p1 = (cond) branch p1, TARGET B mov b, 1 jmp JOIN C TARGET: mov b,0 Predicated Execution (predicated code) Convert control flow dependency to data dependency Pro: Eliminate hard-to-predict branches A p1 = (cond) (!p1) mov b,1 (p1) mov b,0 B C D add x, b, 1 Cons: (1) Fetch blocks B and C all the time (2) Wait until p1 is resolved
The Overhead of Predicated Execution -2% 16% 13% non-predicated p1 = (cond) (!p1) mov b,1 (p1) mov b,0 p1 = (cond) (0) mov b,1 (1)mov b,0 A B C D add x, b, 1 (Predicated code) If all overhead is ideally eliminated, predicated execution would provide 16% improvement in average execution time
The Problem • Due to the predication overhead, predicated execution sometimes reduces performance • Branch misprediction characteristics are dependent on run-time behavior: input set, control-flow path andphase behavior. The compiler cannot accurately estimate the run-time behavior of branches
Talk Outline • Problem • Wish Branches • Experimental Methodology • Results • Conclusion
Wish Branches • A new type of control flow instruction 3 types: wish jump/join and wish loop • The compilergenerates code (with wish branches) that can be executed either as predicated code or non-predicated code (normal branch code) • The hardwaredecides to execute predicated code or normal branch code at run-time based on the confidence of branch prediction • Easy to predict: normal branch code • Hard to predict: predicated code
A A T N B C B C D D A A p1 = (cond) (!p1) mov b,1 (p1) mov b,0 B B mov b, 1 jmp JOIN C C TARGET: mov b,0 normal branch code predicated code Wish Jump/Join High Confidence Low Confidence A wish jump nop B wish join Taken Not-Taken C D A p1=(cond) wish.jump p1 TARGET p1 = (cond) branch p1, TARGET B nop (!p1) mov b,1 wish.join !p1JOIN (1) mov b,1 wish.join (1) JOIN C TARGET: (1) mov b,0 TARGET: (p1) mov b,0 D JOIN: wish jump/join code
do { a++; i++; } while (i<N); Wish Loop H X T X T N N High Confidence Low Confidence Y Y H mov p1, 1 LOOP: (p1) add a, a, 1 (p1) add i, i, 1 (p1) p1 = (cond) wish. loopp1, LOOP EXIT: X X LOOP: add a, a, 1 add i, i, 1 p1 = (i<N) branch p1, LOOP EXIT: (1) (1) (1) Y Y wish loop code normal backward branch code
Mispredicted Case 1: Early-Exit Compared to normal branch code: predicate data dependency and one extra instruction(-) H X1 X2 X3 Y H Correct execution: T T N X T Early-exit: (Low confidence) Flush pipeline N H X1 X2 Y … T N Y X3 Y N
Mispredicted Case 2: Late-Exit Compared to normal branch code: pro: reduce flush penalty (+++) cons: predicate data dependency and one extrainstruction(-) H Correct execution: X1 X2 X3 Y H T T N X T nop nop Late-exit: (Low confidence) N H X1 X2 X3 X4 X5 Y … T T T T N Y
Mispredicted Case 3: No-Exit Compared to normal branch code: predicate data dependency and one extra instruction(-) H X1 X2 X3 Y H Correct execution: T T N Flush pipeline X T No-exit: (Low confidence) N H X1 X2 X3 X4 X5 X6 … T T T T T T Y Y
Advantages/Disadvantages of Wish Branches • Advantages compared to predicated execution • Reduce the overhead of predication • Increase the benefits of predicated code by allowing the compiler to generate more aggressively-predicated code • Provide a mechanism to exploit predication to reduce the branch misprediction penalty for backward branches (Wish loops) • Make predicated code less dependent on machine configuration (eg. branch predictor)
Advantages/Disadvantages of Wish Branches • Disadvantages compared to predicated execution • Extra branch instructions use machine resources • Extra branch instructions increase the contention for branch predictor table entries • May constrain the compiler’s scope for code optimizations
Wish Branch Support • ISA Support • predicated execution, wish branch instruction • Compiler Support • Wish branch generation algorithms The compiler needs to decide which branches are predicated, which are converted to wish branches, and which stay as normal branches • Hardware Support • Confidence estimator • Front-end and branch misprediction detection/recovery module
Talk Outline • Problem • Wish Branches • Experimental Methodology • Results • Conclusion
Experimental Infrastructure • IA-64 provides full support for predication • Convert IA-64 traces to micro-ops to simulate an out-of-order superscalar processor model Source Code IA-64 Binary IA-64 Trace µops IA-64 Compiler (ORC) Micro-op Translator Micro-op Simulator Trace generation module
Simulation Methodology • Nine SPEC 2000 integer benchmarks • Baseline Processor Configuration • Front End • Large and accurate branch predictor(64KB hybrid branch predictor: gshare + local) • Minimum 30-cycle branch misprediction penalty • 64KB, 2-cycle latency I-cache • Execution Core • 8-wide out-of-order processor • 512-entry instruction window • Confidence Estimator • 1KB tagged 16-bit history JRS confidence estimator (Jacobsen et al. MICRO-29)
Talk Outline • Problem • Wish Branches • Experimental Methodology • Results • Conclusion
Performance Improvement -4% 14% 2.02 8% 24% non-predicated 16% over conditional branch prediction (w/o mcf) 11% over selective-predication (w/o mcf) 7 % over aggressive predication (w/o mcf) 14% over conditional branch prediction and 13% over selective-predication and 16% over aggressive-predication 12% over conditional branch prediction 11% over selective-predication 13 % over aggressive predication AGGRESSIVE-PREDICATION: all branches that are suitable for if-conversion are predicated SELECTIVE-PREDICATION: branches are selectively predicated using compile-time cost-benefit analysis
Talk Outline • Problem • Wish Branches • Experimental Methodology • Results • Conclusion
Conclusion • New control flow instructions: wish branches (jump/join/loop) • Wish branches improve performance by dividing the work of predication between the compiler and the microarchitecture • Compiler: analyzes the control-flow graph and generates code • Microarchitecture: makes run-time decision to use predication • Wish branches provide significant performance benefits • 16% compared to conditional branch prediction • 13% compared to selectively predicated code • Wish branches can make predicated execution more viable and effective in high performance processors • By enablingadaptive and aggressive predicated execution