380 likes | 567 Views
Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing. S. M. Farhad PhD Student Supervisor : Dr. Bernhard Scholz Programming Language Group School of Information Technology University of Sydney. Abstract.
E N D
Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz Programming Language Group School of Information Technology University of Sydney
Abstract • Synchronous data flow (SDF) differs from traditional data flow • The schedule of SDF nodes can be done at compile time (statically) • Contribution of this paper: • Develop theory for static scheduling of SDF programs on single or multiple processors
Introduction • Need to depart from the simplicity of von Neumann computer architecture • Programming signal processors using large grain data flow languages [W. B. Ackerman 82] • Ease the programming • Enhancing the modularity of code • Describe algorithms in more naturally • Concurrency is immediately evident from program description
Data Flow Analysis [W. B. Ackerman 82] • P = X + Y • Q = P/Y • R = X*P • S = R – Q • T = R*P • RESULT = S/T • Many of these instructions can run in parallel as long as some constraints are met • These constraints can be represented by a graph • Node represents instructions • Arrow between nodes represents constraints • So, the permissible computation sequence can be for example (1, 3, 5, 2, 4, 6), (1, 2, 3, 5, 4, 6) and others.
(1) P = X + Y (3) R = X*P (2) Q = P/Y (4) S = R - Q (5) T = R*P (6) RESULT = S/T Sequencing Constraints
The Data Flow Paradigm • A program is divided into pieces (nodes or blocks) which can execute whenever input data are available • An algorithm can be described as data flow graph • Node representing function • Arc representing data paths • Signal processing algorithms can also be described as data flow graph • Node is atomic or non-atomic function • Arc is signal path
The Data Flow Paradigm Contd. • The complexity of the functions (granularity) will determine the amount of parallelism available • No attempt to exploit concurrency inside a block • The functions within the blocks can be specified using von Neumann programming techniques • The blocks can themselves represent another data flow graph (hierarchical) • LGDF is ideally suited for signal processing
Synchronous Data Flow Graphs • A block is invoked when input available • When it invoked it consumes a fixed number of input samples on each input path and produces fixed number of output samples • A block is synchronous if we can specify a priori its input and output samples when it is invoked • Assuming that the signal processing system repetitively apply an algorithm to an infinite sequence of data
A synchronous data flow graph • SDF graph requires buffering the data samples passed between blocks and schedule blocks when data are available (static approach) • This could be done dynamically (runtime supervisor, costly approach) e c j b B A i d g f C h
A synchronous data flow graph • SDF graphs can be scheduled statically (at compile time) regardless of the numberof processors • No need to have dynamic control • Communication between nodes and processors is set up by the compiler so no runtime control • Thus the LGDF paradigm gives the programmer a natural way for programming with evident concurrency
Scheduling an SDF graph • Schedule blocks onto processors in such a way that data is available during its invocation • Assumptions • The SDF graph is non terminating (without dead lock) • The SDF graph is connected • Goal is to find a periodic admissible parallel schedule (PAPS also PASS)
Construction of a PASS • Topology matrix e c 1 2 1 i d 2 3 g f 3
Construction of a PASS • Replace each arc with FIFO queue to pass data from one block to another (vary) • Vector b(n) contains the queue sizes of all the buffers at time n • For sequential schedule only one block can be invoked at a time • v(n) is the vector of blocks invoked at time n
D 1 1 1 2 1 1 2D 2 1 3 Construction of a PASS • The change in the buffer size caused by invoking a node is • A unit delay on an arc from A to B means that a n-th sample consumed by node B is (n-1)-th sample produced by node A • So the first sample consumed by destination block is not produced by the source (part of initial state of arc buffer)
D 1 1 1 2 1 1 2D 2 1 3 Construction of a PASS • Because of this initial condition block 2 can be invoked once and block 3 can be invoked twice before block 1 is invoked at all • Delay therefore affect the way the system starts up
Construction of a PASS • Given this computation model (eqn. 1 - 4) • Find necessary and sufficient conditions for existing a PASS, and hence a PAPS • Find practical algorithms that provably finds a PASS if one exists • Find a practical algorithms that construct reasonable (not necessarily optimal) PAPS, if a PASS exists
Necessary condition for existing a PASS • Where s is the number of nodes or blocks in the graph • Definition 1: an admissible sequential schedule is a non-empty ordered list of nodes such that if the nodes are executed in sequence given by , the amount of buffer will remain non negative and bounded. Each node must appear in at least once
Necessary condition for existing a PASS 1 • Theorem 1: For a connected SDF graph with s nodes and topology matrix Γ, rank (Γ) = s -1 is a necessary condition for a PASS to exist. • PASS of period p (3)=> b(p) = b(0) + Γq where 2 2 1 1 1 2 1 2 3 1 3
Necessary condition for existing a PASS • Since the PASS is periodic, we can write • Since the PASS is admissible, the buffers must remain bounded, by definition 1. The buffers remain bounded if and only if where O is a vector full of zeros • For q ≠ O, this implies that rank (Γ) < s where s is the dimension of q. But rank (Γ) can be either s or s – 1, and so it must be s – 1 [Lemma 3]
Necessary condition for existing a PASS 1 • Theorem 1 indicates that if we have a SDF graph with a topology matrix of rank s, then the graph is somehow defective and no PASS can be found for it 2 1 1 1 1 2 1 2 3 1 3
Necessary condition for existing a PASS • Theorem 2: For a connected SDF graph with s nodes and topology matrix Γ, and with rank(Γ) = s – 1, we can find a positive integer vector q ≠ Osuch that Γq = O where O is the zero vector. • Definition 2: A predecessor to a node x is a node feeding data to x.
Necessary condition for existing a PASS • Definition 3: (Class S algorithm) Given a positive integer vector q such that Γq = O and an initial state for the buffers b(0), the ith node is runnable at a given time if it has not been run times and running it will not cause a buffer size to go negative. A class S algorithm is any algorithm that schedules a node if it is runnable, updates b(n) and stops only when no more nodes are runnable. If class S algorithms terminates before it has scheduled each node the number of times specified in the q vector, then it is said to be deadlocked.
Necessary condition for existing a PASS • Theorem 3: Given a SDF graph with topology matrix Γ and given a positive integer vector q s.t. Γq = O, if a PASS of period p = exists, where is a row vector full of ones, any class S algorithm will find such a PASS.
Necessary condition for existing a PASS 1 1 1 1 1 1 D 2 2 1 1 2 2 (a) (b) Two SDF graph with consistent sample rates but no admissible schedule
Necessary condition for existing a PASS • Theorem 4:Given a SDF graph with topology matrix Γ and given a positive integer vector q s.t. Γq = O, a PASS of period p = exists if and only if a PASS of period Np exists for any integer N. • Theorem 4 tells us that it does not matter what positive integer vector we use from the null space of the topology matrix, so we can simplify our system by using the smallest such vector, thus obtaining a PASS with minimum period.
Class S algorithm given the theorems • Solve for the smallest positive integer vector • Form an arbitrary ordered list L of all nodes in the system • For each , schedule if it is runnable, trying each node once • If each node has been scheduled times, STOP • If no node in L can be scheduled, indicate a deadlock • Else goto 3 and repeate
Constructing a PAPS • If a workable schedule for a single processor can be generated then a workable schedule for a multiprocessor system can also be generated • First step is to construct an acyclic precedence graph for J period of the PASS by class S algorithm
Construct an acyclic precedence graph by example 2D 2 1 • This graph is neither acyclic nor a precedence graph • Possible minimum PASS is {1, 3, 1, 3}, {3, 1, 1, 2} or {1, 1, 3, 2} each with period 4. • {2, 1, 3, 1} not a PASS because node 2 is not immediately runnable 3 1 1 D 2 2 1
Construct an acyclic precedence graph 1 1 2 3 2 1 1 1 3 2 3 1 J=1 J=2
Next step constructing a parallel schedule • By critical path method [Adam 74] or by Hu-level scheduling algorithm [T. C. Hu 61] • A level is determined for each node in the acyclic precedence graph, where the level of a given node is the worst case of the total of the runtimes of nodes on a graph from the given node to a terminal node of the graph • The terminal node is a node with no successor • If there is no terminal node then one can be created with zero runtime
Hu-level scheduling algorithm 6 5 3 1 3 2 1 2 3 2 6 1 3 1 3 1 2 3 3 2 6 3 3 1 J=1 J=2
Constructing a parallel schedule • Hu-level scheduling algorithm simply schedules available nodes with the highest level first • When there are more than available nodes with the same highest level than there are processors, a reasonable heuristic is to schedule the ones with the longest runtime first
Constructing a parallel schedule 3 3 1 3 PROC 1 PROC 1 PROC 2 1 1 2 PROC 2 1 1 2 1 2 J=1 J=2 Two processors, runtime of nodes 1,2,3 are 1, 2,3 time units respectively
Limitations of Model • Do not greater scale conditional control flow like general purpose languages • Asynchronous graphs • Connecting to the outside world • Data dependent runtime of blocks
Summary • This paper describes the theory necessary to develop a signal processing programming methodology that offers • Programmer convenience • Natural way to describe signal processing • Readily use the available concurrency
Question? Thank you