290 likes | 436 Views
Exploiting Postdominance for Speculative Parallelization. Mayank Agarwal, Kshitiz Malik, Kevin Woley, Sam Stone, Matthew Frank Implicitly Parallel Architectures Group University of Illinois at Urbana-Champaign Originally in HPCA-13 Modified and Presented By: Borys Bradel. Outline.
E N D
Exploiting Postdominance for Speculative Parallelization Mayank Agarwal, Kshitiz Malik, Kevin Woley, Sam Stone, Matthew Frank Implicitly Parallel Architectures Group University of Illinois at Urbana-Champaign Originally in HPCA-13 Modified and Presented By: Borys Bradel
Outline • Motivation • Introduction • PolyFlow Architecture • Evaluations • Conclusions CARG March 14, 2007
Speculative Parallelization • Parallelize single-threaded applications • Dynamically break execution into concurrent tasks • Multi-threaded and multi-core systems • Maintain sequential semantics A D A B C B PU1 PU2 PU3 PU4 C D CARG March 14, 2007
Task Extraction Policies • Identify possible points for task creation • Critical to successful parallelization • Desirable features • Large set of possible tasks • Restrict amount of speculation • Exploit different kinds of parallelism • Work for varying application behaviors CARG March 14, 2007
Limitations of Branch Prediction • Branch mispredicts limit exploitable amount of ILP • Superscalars discard all instrs fetched after mispred branch • Not all need to be discarded • Immediate postdominator • Earliest control-equivalent point • Control flow guaranteed to reconverge at E A B C D E F CARG March 14, 2007
Control-Equivalent Spawning • Start new task at Immediate PostDom of branch • Spawn E as a new task at B • Control-equivalent to B • Main thread can speculate past B • Spawned thread as (control) speculative as branch B A Spawner B C D Spawnee E F CARG March 14, 2007
A B C D E F Control-Equivalent Spawning PU1 PU2 PU3 A Task Spawn B Resolve Mispredict Spawned Task E D C F Reconnect A Task Spawn B Spawned Task Resolve Mispredict E D C F Reconnect CARG March 14, 2007
Managing Data Dependences .. … … Branch … Prod1 … Prod2 … • Spawned tasks • Control-equivalent to spawner • Data dependent • Restrict data speculation • Delay dependent instructions • Register and memory • Until data becomes available • Independent instructions can execute in parallel Spawner Spawned Task ... Cons1 … Cons2 … Cons3 … CARG March 14, 2007
Control-Equivalent Parallelization • Spawn immediate postdominator of branch • Task control-equivalent to spawner • Benefits • Subsumes heuristics based on program structures • Better performance than hybrid heuristic policies • Amenable to dynamic implementations CARG March 14, 2007
Outline • Motivation • Introduction • PolyFlow Architecture • Evaluations • Conclusions CARG March 14, 2007
Immediate Postdominator Spawns • Broad classification into 4 categories: • Hammocks • Loop fall-throughs • Procedure fall-throughs • Others CARG March 14, 2007
A ends in if-then-else branch D postdominates A Upon reaching A Spawn new task starting at D Main task resolves branch Merits Spawns across mispredicts Finds useful work beyond mispredicts Parallelize inner loops Not directly exploited in most systems Imm PDom Hammocks Main Task A B C D E Spawned Task CARG March 14, 2007
D ends in a loop branch Upon reaching D Start new task at E Main task executes loop New task executes fall-through Merits: Exploit parallelism in outer loops Reduce wastage from mispredicted loop branch Imm PDom Loop Fall-Throughs A MainTask B C D E Spawned Task CARG March 14, 2007
C postdominates call instruction Upon reaching B Spawn new task at C Main task executes procedure New task executes fall-through Merits Spawns tasks in distant regions Warms up ICache Imm PDom Procedure Fall-Throughs Main Task A Proc X B call x C Spawned Task CARG March 14, 2007
Others • Remaining immediate postdoms • Postdominators of indirect calls and jumps • Complex control flow • ~5-10% of static postdominators • Important in several programs CARG March 14, 2007
Dynamic Spawn Distribution - Hammock and Others constitute ~65% of dynamic spawns - Not captured by most Speculative Parallelization Systems CARG March 14, 2007
Twolf new_dbox_a Processor 1 spawn 9dbc spawn 9dc8 Processor 2 spawn 9dd8 Processor 3 spawn 9dec Processor 4 Processor 5 CARG March 14, 2007
Outline • Motivation • Introduction • PolyFlow Architecture • Evaluations • Conclusions CARG March 14, 2007
PolyFlow Task Spawn Unit if (nextPC==x) spawn y Fetch PC 1-8 Unified Scheduler Divert Queue Execute 1-8 Flush Retire CARG March 14, 2007
The PolyFlow Architecture • Speculative parallelization system • Current evaluations on wide SMT core • Extend SMT system with task spawn unit • Manage task spawn, reconnection • Learn dependence and handle misspeculation • Use compiler-generated postdominators • Passed as hints to dynamic system • Stored in a separate “spawn hint cache” CARG March 14, 2007
Evaluation Environment • Baseline Superscalar • 8-wide fetch/issue OOO core • 64-entry scheduler, 512-entry ROB • 8K 2-way assoc L1 ICache, 16K 4-way assoc L1 DCache • 512K 8-way assoc L2 Cache • Speculative Parallelization System • 8-context SMT CARG March 14, 2007
Limitations • Each thread can spawn one successor • Only outer most branch in if-else nest • 512 entries in reorder buffer • Cannot reclaim resources • Limits parallelism • Superscalar – fetch 1 taken branch per cycle • PolyFlow – from 2 tasks per cycle, 1 taken branch/c CARG March 14, 2007
Outline • Motivation • Introduction • PolyFlow Architecture • Evaluations • Conclusions CARG March 14, 2007
Individual Spawn Heuristics • No single heuristic suitable for all applications • Control-equivalent spawning performs well overall FT=fall through CARG March 14, 2007
Hybrid Spawn Policies CARG March 14, 2007
Dynamic Implementation • Dynamic Reconvergence Analysis* • Learns immediate postdominators dynamically • Trains quickly • Can Drive Control-Equivalent Spawning • Spawn reconvergence point of branches • Alternative to compiler hints * J. D. Collins et al, Control Flow Optimization Via Dynamic Reconvergence Prediction, MICRO 2004 CARG March 14, 2007
Outline • Motivation • Introduction • Polyflow Architecture • Evaluations • Conclusions CARG March 14, 2007
Conclusions • Control-Equivalent Spawning • Reduces control speculation in spawned tasks • Generalizes common heuristics • For an SMT-based system • Over twice the speedups of best heuristics • Better than an aggressive hybrid policy • Amenable to dynamic implementations CARG March 14, 2007