270 likes | 372 Views
Static Translation of Stream Programs. S. M. Farhad School of Information Technology The University of Sydney. Parallel Programming Gap. Uniprocessor hits the physical limit Multicore, many core systems Programming challenges now Multiple flows of control and memories
E N D
Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney
Parallel Programming Gap • Uniprocessor hits the physical limit • Multicore, many core systems • Programming challenges now • Multiple flows of control and memories • Finding algorithm that can run in parallel • Correctness of the program • Synchronization Issues • Debugging the program
Stream Programming Paradigm General purpose programming paradigm Suitable for high performance applications Inherent parallelism Two types of elements in a program: Actors: computation elements Streams: channels to ship data Stream graph: composition of actors and streams Input and output: stream Input channels Actor Output channels 3
StreamIt Language filter pipeline • StreamIt: stream programming language • Architecture independent • Modular may be any StreamIt language construct splitjoin parallel computation splitter joiner feedback loop splitter joiner
A Simple Filter in StreamIt int->int filter A { init { // empty } work pop 1 peek 2 push 1 { push(peek(0) + peek(1)); pop(); } } A
StreamIt Example z A 1 float->float pipeline Example() { add A(); add B(); add splitjoin { split duplicate; add D(); add E(); join roundrobin; } add G(); add H(); } B 1 split 1 1 C D 1 1 join 1 E 1 F
Proc-1 Proc-2 Proc-N Research Question? Parallel System z A 1 B 1 split ? 1 1 D E 1 1 join 1 G 1 H
Static Translation of Stream Programs • We propose • Novel steady state analysis describing the output bandwidth of an actor depending on the input of the stream program • Quantitative analysis to resolve bottlenecks in stream programs • Algorithm for mapping actors to parallel system • Scheduling mapped actors to processor • Our goal • To statically optimize the throughput of a stream program on a parallel system 8
Overview of the Proposed System Finding a Closed Form for Steady State Find Mapping by ILP or Approximation Resolve Bottleneck Schedule Actors for Processors 9 9
Bandwidth Transfer Model k i • Synchronous model • Fundamental assumption: • Actors deliver constant output bandwidth given a constant input bandwidth • Output bandwidth of actor is normalized (scaled btw. 0..1) • Bandwidth of channel (i,j): output bandwidth of actor i multiplied with weight wij wijxi j Finding a Closed Form for Steady State
Simultaneous Linear Functional Equations System • Bandwidth transfer functions: • Resulting in a simultaneous functional equation system: Finding a Closed Form for Steady State
Solving Equation System • Solve by Gaussian-Style equation solver • Bandwidth variables are successively eliminated • Substitution • Loop-breaking • Solution of the system is a linear function for each actor , which • does not depend on the predecessors • depends only on the input bandwidth “z” Finding a Closed Form for Steady State
Mapping Actors to Processors z A • An NP-hard problem • Takes too long time • Approximation algorithm • Much faster O(n log n) • Non-optimal solution • We formulate Integer Linear Programming problem considering • Input, output and processing bandwidths of each actor • Processing capacity of each processor • Inter processor communication bandwidths B 1.0 P1 split 1.0 P2 0.5 0.5 D E 1.0 1.0 join 1.0 G P3 1.0 H Find Mapping by ILP or Approximation
mid zsys left 1 0 right Solution space Find Optimal Solution by Binary Search • Developing a test for a given input bandwidth “z” of stream program • Binary Search • If solution is not feasible at the mid point, new upper bound for z • If solution is feasible at the mid point, new lower bound for z Find Mapping by ILP or Approximation
Test for a given z: Simple Model is a 0-1 binary variable 1 ≤ i ≤ n, 1 ≤ j ≤ p for all i, 1 ≤ i ≤ n …… (1) for all j, 1 ≤ j ≤ p …… (2) where Find Mapping by ILP or Approximation
Bottleneck Analysis SystemBW • Constraints limiting input bandwidth of the whole program • Input, output and processing bandwidth constraints • Bottleneck-free if SystemBW < ActorBW • If SystemBW > ActorBW then there is a bottleneck in the system 1 0 < ActorBW Resolve Bottleneck
Region Duplication Resolve Bottleneck
Overview of the Proposed System Finding a Closed Form for Steady State Find Mapping by ILP or Approximation Resolve Bottleneck Schedule Actors for Processors 18 18
Summary • Synchronous model for stream programs • Novel steady state model • Statically optimize the throughput of stream programs • Resolving bottleneck by simple quantitative analysis • Finding an approximation for the mapping problem
Related Works [1] Static Scheduling of SDF Programs for DSP [Lee ‘87] [2] StreamIt: A language for streaming applications [Thies ‘02] [3] Phased Scheduling of Stream Programs [Thies ’03] [4] Exploiting Coarse Grained Task, Data, and Pipeline Parallelism in Stream Programs [Thies ‘06] [5] Orchestrating the Execution of Stream Programs on Cell [Scott ’08] [6] Software Pipelined Execution of Stream Programs on GPUs [Udupa‘09] [7] Synergistic Execution of Stream Programs on Multicores with Accelerators [Udupa ‘09]