550 likes | 721 Views
Embedded Computer Architecture 2. (BOCA). Bijzondere Onderwerpen Computer Architectuur Block C From DG to SFG. Dependency graphs and Signal Flow Graphs. Dependency Graph:
E N D
Embedded Computer Architecture 2 (BOCA) Bijzondere Onderwerpen Computer Architectuur Block C From DG to SFG
Dependency graphs and Signal Flow Graphs Dependency Graph: All communicated values are scalars and the processing elements are functions on scalars. Each arrow carries only one value. Time does not play a role. PE PE PE PE PE PE PE PE PE PE PE PE V is the value domain, number of inputs = number of outputs = N Signal Flow Graph: The communicated values are streams, i.e. functions on time and the processing elements are functions on streams. Z represents time
xt xt-1 xt add add add zt zt-1 zt yt yt-1 yt xt+1 add zt+1 equi-time zone yt+1 Unfolding time This holds for any t add is a combinatorial system , no delays, no state. So: The scheme is re-instantiated for each time instance.
clock old state Fs Fo input output Mealy Finite state machines • Any synchronous machine can be modelled by a • Mealy machine or a • Moore machine • Such a machine consists of: • Delays (registers, D-flipflops) (modeled by a delay operator) • Delay-less functions (combinatorial logic, algebraic functions)(modeled by a function) clock old state new state new state Fs Fo input output Moore
st-1 st Fs yt Fo xt st-1 st F xt yt Finite state machines Mealy or: This holds for any t
= = = = F F F F st-1 st F xt yt Unfolding time st-2 st-1 xt-1 yt-1 st-1 st xt yt st This holds for any t st+1 xt+1 yt+1 st+1 st+2 The scheme is re-instantiated for each time instance yt+2 xt+2
F F F F st-1 st st-1 st st+1 F xt yt st+2 Unfolding time st-2 xt-1 yt-1 xt yt Sketched in DG style we obtain a 1 dimensional index space xt+1 yt+1 This suggest folding a DG into a scheme. Such a scheme is called a signal flow graph xt+2 yt+2
PE Dn Signal flow graphs • A signal flow graph is a graph consisting of • Processing elements, PE’s • Edges representing communications.The edges are colored with the number of unit delays a communication takes. • A SFG may contain loops as long as there is at least one unit delay • in the loop. Different from DG’s
st-2 xt-1 yt-1 F F F F F st-1 xt yt xt yt st D1 xt+1 yt+1 st st+1 xt+2 yt+2 st+2 Signal flow graphs Folding Unfolding Signal flow graph Dependency graph
Both in the dependency graph and the signal flow graph we • distinguish between • input edges • output edges and • intermediate edges PE PE PE PE PE PE PE PE PE PE D D D D D D D PE PE PE PE Example of folding the sorting algorithm
Algebraic description of a SFG A signal flow graph is described in the same way as a dependency graph. The structure of a signal flow graph is fully described by: • The set of nodes • The set of intermediate edges • The set of input edges • The set of output edges virtual nodes: in which
D D D D D D D PE PE PE PE Example (SFG sorter) nodes: Vn= {(0),(1),(2),(3)} 0 1 2 3 intermediate Eint= { ((0),(0)) , ((1),(0)) , ((2),(0)) , edges: ((0),(1)) , ((1),(1)) , ((2),(1)) } input edges: Ein= { ((0),(0)) , ((1),(0)) , ((2),(0)) , ((3),(0)) , ((-1),(1)) } output edges:Eout= { ((0),(0)) , ((1),(0)) , ((2),(0)) , ((3),(0)) , ((0),(1)) , ((1),(1)) , ((2),(1)) , ((3),(1)) }
Algebraic description of the mapping DG SFG Only the mapping of the structure is considered Folding allong the first coordinate is described by: In which v is any vector in the DG description and w the vector in the resulting SFG Example: The SFG of the sorter is obtained from its DG by:
PE PE PE PE PE PE PE PE PE PE D D D D D D D PE PE PE PE Example of folding the sorting algorithm (0,1)((-1,0)T,(1,0)T) = (0,0) (0,1)(0,0)T = 0 (0,1)((0,0)T,(0,1)T) = (0,1) (0,1)(1,1)T = 1 (0,1)((0,-1)T,(0,1)T) = (-1,1 ) (0,1)(1,0)T = 0 (0,1)((3,0)T,(1,0)T) = (0,0) T = transpose
Annotation of a SFG • The structural information of a SFG only descibes the structure of the hardware that is needed for executing the algorithm, i.e.: • processing elements • internal communication channels • input channels and • output channels • No information is given about the usage of these structural elements and its behavior. • This information is given by the annotation of the SFG
Dn x(t) Annotation of a SFG The annotation of an SFG consists of The number of unit delays n a message (value) is delayed on an edge. The time (t) at which a communication (of a variable denoted by a variable x) takes place These annotations follow from the time-coordinate in the DG. The annotation of the DG maps on the SFG according to the structural item to which it belongs.
t-2 t-1 t t+1 D1 D1 ,D0 D0 D1 D2 The number of unit delays Dn • The number of unit delays implied by an edge in the SFG follows from the difference between • the time coordinate of vdestand • the time coordinate of vsource • of the edge in the DG DG SFG
The time (t) at which a communication takes place at an edge in the SFG follows from the time coordinate of the corresponding edge in the DG. Hence, for an edge The number of unit delays Dn So, if we fold in the direction of the first dimension, a vector in the DG maps on a vector in the SFG of which the delay Dn is given by This clarifies also the choice we made for modeling edges.
PE PE PE PE PE PE PE PE PE PE D D D D D D D PE PE PE PE Example of folding the sorting algorithm (1,0).((1,1)T-(1,0)T) = (1,0).((0,1)T) = 0 (1,0).((1,0)T-(0,0)T) = (1,0).((1,0)T) = 1 (1,0).((3,0)T-(2,0)T) = (1,0).((1,0)T) = 1
PE PE PE PE PE PE PE PE PE PE D D D D D D D PE PE PE PE Example of folding the sorting algorithm (1,0).(1,1)T = 1 (1,0).(0,-1)T = 0 (1,0).(2,-1)T = 2
m-1,0 x0,0 x0,-1 0,0 m0,0 m0,1 x1,0 x1,1 x1,-1 1,0 1,1 m1,0 m1,1 m1,2 x2,0 x2,1 x2,2 x2,-1 2,0 2,1 2,2 m2,0 m2,1 m2,2 The DG and the SFG fully annotated SFG DG m0(0) m0(1) m1(1) m1(0) m0(-1) m2(1) D D D D D 0 1 2 m1(2) m2(2) m0(2) x2(2) x1(1) x0(0) x-1(0) x-1(1) x-1(2) with full annotation x0(1) x0(2) x1(2)
Folding along the first coordinate DG: N dimensions SFG: N – 1 dimensions Mapping matrix: MTis (N-1) x N matrix Node mapping: Summary
Edge mapping (intermediate and I/O): Mapping matrix : Delay Dn : Communication time instances (t) : Notice: is (N x N) identity matrix Summary (continued) Annotation mapping (edge annotation) :
Combining index transformation and time folding Index transformation of a DG has been described by: We neglected shifting the origin because this was just cosmetic. Folding along the first dimension has been described by; Edge and node mapping: Annotation of time instance: Hence, combined index transformation and time folding is described by:
processor assignment • scheduling Combining index transformation and time folding The combination of index transformation and time folding is called: Processor assignment describes which operation (node in DG) is mapped on which processing element (node in SFG). Scheduling determines the time at which the data values are produced, i.e. order of execution of the operations. In general processor assignment and scheduling are not divided into index transformation and folding
d is called the projection vector. All nodes and edges in the DG are mapped by P on the hyper-plane through the origin, that is perpendicular to d. Combining index transformation and time folding The processor assignment matrixP is determined by: MTis (N - 1) x N matrix and A is non-singular. Hence P is a (N - 1) x N matrix consisting of (N - 1) independent row vectors that span a (N-1)-dimensional hyper-plane through the origin. Such a hyper-plane can also be described by a vector d perpendicular to the hyper-plane. In which d is defined by:
S is a matrix consisting of one row. Therefore, it is also referred to as the scheduling vectors. determines the time instance at which an operation takes place. Operations v and v’ are scheduled at the same time if So nodes in the DG that are in the same hyper-plane perpendicular to s, are scheduled for the same time. Combining index transformation and time folding Scheduling has been described by the scheduling matrixST. is the transpose of s
So with and we obtain: We will use this result later for determining the constraints on the projection vector d and the scheduling vector s. Combining index transformation and time folding Recall: Hence, and thus:
s = d = (1,1) Hyper-plane perpendicular to s and d. Operations in this plane are mapped on the same time. On this plane all nodes in the DG are mapped. Example 1 (sorter) 0,0 x0,-1 Communication time instances (t): x1,-1 1,0 1,1 P = (-1,1) 2,0 2,1 2,2 x2,-1 Delay Dn 3,0 3,1 3,2 3,3 m3,3 m32 m3,0 m3,1 D x(-1)- 1 0 D D D m(6)0 x(0)- 2 -1 D D x(1)- 3 D m(5)- 1 -2 D D m(4)- 2 D -3 x(2)- 4 systolic array m(3)- 3
s = (1,0) d = (1,1) Hyper-plane perpendicular to s. Operation in this plane are mapped on the same time. Example 2 (sorter) 0,0 x0,-1 Communication time instances (t): x1,-1 1,0 1,1 P = (-1,1) 2,0 2,1 2,2 x2,-1 Delay Dn 3,0 3,1 3,2 3,3 m3,3 m32 m3,0 m3,1 0 x(0)- 1 D m(3)0 -1 x(1)- 2 x(2)- 3 D m(3)- 1 -2 m(3)- 2 D -3 x(3)- 4 m(3)- 3 not systolic
Restrictions on s and d Let P describe the null-space of d, i.e. P.d = 0, then: Any vector that is a linear combination of the row-vectors of P is perpendicular to d. sT is not perpendicular to d, hence it cannot be written as a linear combination of row-vectors of P. Therefore consists of a set of independent row-vectors, and thus Proof: (if part)
Restrictions on s and d Suppose then because and we obtain And thus So and thus Hence s and d may not be perpendicular. Proof: (only if part) Combining both parts of the proof, we obtain:
Restriction on s and the edges in the DG The edges in the DG express dependencies. The scheduling vector s expresses the order in which the operations will be executed in the processing elements of the SFG. Clearly, an operation may not depend on an operation in the future. Therefore the angle between the edges e in the DG and the scheduling vector s must be less or equal to 90 degrees. So for any dependency e edge in the DG must hold:
a projection vector d and • a scheduling vector s such that and for all e in the DG Summary Processor assignment and scheduling may start from either a re-indexing matrix A followed by folding along the first coordinate, i.e.: or
s = (1,0) d = (1,1) d = (-2,-2) Only the direction d counts. I.e. d and a.d give the same choice of processor assignment matrices. Some remarks on the processor assignment matrix P = (1,-1) 0,0 x0,-1 0 x(0)1 D m(3)0 1 x(1)2 x1,-1 1,0 1,1 x(2)3 D m(3)1 2 m(3)2 D 3 x(3)4 2,0 2,1 2,2 x2,-1 m(3)3 0 P = (-1,1) x(0)- 1 3,0 3,1 3,2 3,3 D m(3)0 -1 x(1)- 2 m3,3 m32 m3,0 m3,1 x(2)- 3 D m(3)- 1 -2 A different processor assignment matrix only results in different indices in the SFG, i.e. re-indexing the SFG. m(3)- 2 D -3 x(3)- 4 m(3)- 3
with and with and then So, SFG’ could have been obtained immediately by means of the processor assignment matrix Q.P. And thus a different processor assignment matrix only results in different indices in the SFG, i.e. re-indexing the SFG. However implies Some remarks on the processor assignment matrix An N-dimensional DG results in a (N -1)-dimensional SFG. Re-indexing the SFG thus can be done with a non-singular (N-1) x (N-1) matrix Q. Such that: Let SFG be obtained from DG by means of the processor assignment matrix P, so:
Specification: Recurrent relations: = c s i i , 7 Some remarks on projection direction (1) Example: Band matrix
s = d = (1,0) Band matrix d-1,0 d-1,1 d-1,2 s0,2 -2 1 2 d0,3 s1,3 -2 1 2 d1,4 P = (0,1) s2,4 -2 1 2 d2,5 s3,5 -2 1 2 d3,6 s4,6 -2 1 2 d4,7 • Disadvantages: • Irregular SFG • Function PE is time- • dependent • Size s5,7 -2 1 2 d(-1)0 d(-1)1 d(-1)2 d(0)3 d(1)4 d(2)5 d(3)6 d(4)7 D D D D D D D D D D D D D D D s(5)7 s(0)2 s(1)3 s(2)4 s(3)5 s(4)6
s = d = (1,1) Band matrix d-1,0 d-1,1 d-1,2 d(11)3 s0,2 d(9)3 -2 1 2 d0,3 d(7)3 s1,3 -2 1 2 d1,4 d(5)3 P = (-1,1) s2,4 -2 1 2 d(3)3 d2,5 s3,5 • Advantages: • Regular SFG • Function PE is time- • independent • Size is small d(1)3 -2 1 2 d3,6 s4,6 D d(0)2 -2 1 2 d4,7 D s5,7 d(-1)1 s(2)2 2 D -2 1 2 D s(4)2 D 1 D D s(6)2 D -2 s(8)2 Conclusion: The size and properties of the SFG strongly depend on projection direction s(10)2 s(12)2
x y H Infinite Dependency Graphs Convolution • Projection direction: only one choice (we will see why) • Scheduling direction; multiple choices In which h is the impulse response defining the system H. Recurrent relations: hbe a bounded sequence, i.e.: for and x is an unbounded sequence
d-1,0 = x0 s0,0 = y0 s = d = (1,1) h2 h1 h0 d0,1=x1 s1,1 = y1 h2 h1 h0 d1,2=x2 s2,2 = y2 h2 h1 h0 d2,3=x3 s3,3 = y3 h2 h1 h0 d3,4 = x4 s4,4 = y4 h2 h1 h0 Infinite Dependency Graphs (SFG 1) SFG: DG: . . . . . . d(7)-1= x4 d(5)-1= x3 P = (1,-1) d(3)-1= x2 d(1)-1= x1 D d(-1)-1= x0 . . . . . . h0 D . . . . . . s(0)0= y0 D h1 D D s(2)0= y1 D h2 s(4)0= y2 s(6)0= y3 s(8)0= y4 . . . . . .
D h0 D D h1 D D D h2 Infinite Dependency Graphs(SFG 1) d(-1)-1= x0 t= -1 d(0)0= x0 t= 0 d(1)-1= x1 t= 1 d(1)1= x0 t= 2 d(2)2= x0 d(2)0= x1 t= 3 d(3)-1= x2 d(3) 1= x1 d(4)2= x1 d(4)0= x2 t= 4 h2 h1 h0 x x x + + s(2)2= h2.x0 t= 2 t= 3 s(3)1= h2.x0 + h1.x1 t= 4 s(4)0= h2.x0 + h1.x1 + h0.x2= y2
d-1,0 = x0 d = (1,1) s0,0 = y0 h2 h1 h0 d0,1=x1 s = (1,0) s1,1 = y1 h2 h1 h0 d1,2=x2 s2,2 = y2 h2 h1 h0 d2,3=x3 s3,3 = y3 h2 h1 h0 d3,4 = x4 s4,4 = y4 h2 h1 h0 Infinite Dependency Graphs (SFG 2) SFG: DG: . . . . . . d(3)-1= x4 P = (1,-1) d(2)-1= x3 d(1)-1= x2 d(0)-1= x1 D d(-1)-1= x0 . . . . . . h0 D . . . . . . s(0)0= y0 h1 D s(1)0= y1 h2 s(2)0= y2 s(3)0= y3 s(4)0= y4 . . . . . .
D h0 D h1 D D h2 Infinite Dependency Graphs(SFG 2) d(-1)-1= x0 t= -1 d(0)0= x0 t= 0 d(0)-1= x1 t= 1 d(1)1= x0 d(1)0= x1 d(1)-1= x2 t= 2 d(2)1= x1 d(2)2= x0 d(2)0= x2 d(2)-1= x3 h2 h1 h0 x x x + + s(2)2= h2.x0 t= 2 s(2)1= h2.x0 + h1.x1 s(2)0= h2.x0 + h1.x1 + h0.x2= y2
d-1,0 = x0 d = (1,1) s0,0 = y0 h2 h1 h0 d0,1=x1 s = (0,1) s1,1 = y1 h2 h1 h0 d1,2=x2 s2,2 = y2 h2 h1 h0 d2,3=x3 s3,3 = y3 h2 h1 h0 d3,4 = x4 s4,4 = y4 h2 h1 h0 Infinite Dependency Graphs (SFG 3) SFG: DG: . . . . . . d(4)-1= x4 P = (1,-1) d(3)-1= x3 d(2)-1= x2 d(1)-1= x1 d(0)-1= x0 . . . . . . h0 . . . . . . s(0)0= y0 D h1 s(1)0= y1 D h2 s(2)0= y2 s(3)0= y3 s(4)0= y4 . . . . . .
h0 D h1 D h2 Infinite Dependency Graphs(SFG 3) d(0)2= x0 d(0)1= x0 d(0)0= x0 d(0)-1= x0 t= 0 d(1)0= x1 d(1)-1= x1 t= 1 d(1)2= x1 d(1)1= x1 d(2)-1= x2 t= 2 d(2)1= x2 d(2)2= x2 d(2)0= x2 h2 h1 h0 x x x + + s(0)2= h2.x0 t= 0 t= 1 s(1)1= h2.x0 + h1.x1 t= 2 s(2)0= h2.x0 + h1.x1 + h0.x2= y2
The DG of the convolution x0 y0=s0,2 h0 h1 h2 x1 h0 h1 h2 y1=s1,2 x2 The values hi are local constants in the nodes of the DG. h0 h1 h2 y2=s2,2 x3 h0 h1 h2 y3=s3,2 x4 Recall that this DG can be obtained from the previous version of the convolution algorithm by changing the summing order and re-indexing. h0 h1 h2 y4=s4,2 x5 h0 h1 h2 y5=s5,2
d = (1,0) s = (1,0) s = (1,1) s = (0,1) s = (-1,1) D1 D1 D1 D0 D1 D1 D1 D0 D1 D1 D0 D1 D1 D2 D1 D0 h0 h0 h0 h0 h1 h1 h1 h1 h2 h2 h2 h2 D1 D2 D1 D0 D1 D0 D1 D2 The DG of the convolution P = (0,1) x0 s y0=s0,3 h0 h1 h2 x1 x h0 h1 h2 y1=s1,3 x2 h0 h1 h2 y2=s2,3 x3 h0 h1 h2 y3=s3,3 x4 h0 h1 h2 y4=s4,3 x5 ? h0 h1 h2 y5=s5,3
s x d = (1,0) + + s = (1,0) x x x h0 h1 h2 D0 D0 D0 D1 h0 h1 h2 D1 D1 The DG of the convolution The most interesting solution that we find here is: P = (0,1) With hardware interpretation: Notice that this solution differs from all the previous solutions.
If the number of inputs and outputs is finite, the inputs and outputs in the DG may be mapped in any order on various inputs in the SFG. In case of a streaming environment, the number of input and output variables is infinite and thus the corresponding input (output) variables in the DG are to be mapped on a single input (output) variable in the SFG. This is determined by the projection vector d. Increasing input indices should correspond to increasing time. Recall that only the direction of the projection vector counts. ( if d is a correct projection vector then also a.d is a correct one) We choose to have the direction of d in the direction of increasing input and output indices. Then, for streaming environments we additionally require: And because we get The scheduling vector in a streaming environment