210 likes | 299 Views
Data Dependences. CS 524 – High-Performance Computing. Motivating Example: Superscalar Execution. Data Dependences. Fundamental execution ordering constraints: S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2 S1 must execute before S2 (flow-dependence)
E N D
Data Dependences CS 524 – High-Performance Computing
Motivating Example: Superscalar Execution CS 524 (Au 05-06)- Asim Karim @ LUMS
Data Dependences • Fundamental execution ordering constraints: S1: a = b + c S2: d = a * 2 S3: a = c + 2 S4: e = d + c + 2 • S1 must execute before S2 (flow-dependence) • S2 must execute before S3 (anti-dependence) • S1 must execute before S3 (output-dependence) • But, S3 and S4 can execute concurrently S1 S2 S3 S4 CS 524 (Au 05-06)- Asim Karim @ LUMS
Types of Dependences (1) • Three types are usually defined: • Flow-dependence occurs when a variable which is assigned a value in one statement say S1 is used in another statement, say S2 later. Written as S1δf S2 • Anti-dependence occurs when a variable which is used in one statement say S1 is assigned a value in another statement, say S2, later. Written as S1δa S2 • Output dependence occurs when a variable which is assigned a value in one statement say S1 is later reassigned in another statement say S2. Written as S1δo S2 CS 524 (Au 05-06)- Asim Karim @ LUMS
Types of Dependences (2) • Type of dependence found by using IN and OUT sets for each statement • IN(S): the set of memory locations read by statement S • OUT(S): the set of memory locations written by statement S • A memory location may be included in both IN(S) and OUT(S) • If S1 is executed before S2 in sequential execution, then • OUT(S1) ∩ IN(S2) ≠ {} ==> S1δfS2 • IN(S1) ∩ OUT(S2) ≠ {} ==> S1δaS2 • OUT(S1) ∩ OUT(S2) ≠ {} ==> S1δoS2 CS 524 (Au 05-06)- Asim Karim @ LUMS
Data Dependence in Loops (1) • Associate an instance to each statement and determine dependences between the instances • For example, we say S1(10) to mean the instance of S1 when i = 10 do i = 1, 50 S1: A(i) = B(i-1) + C(i) S2: B(i) = A(i+2) + C(i) end do CS 524 (Au 05-06)- Asim Karim @ LUMS
Data Dependence in Loops (2) do i = 1, 50 S1: A(i) = B(i-1) + C(i) S2: B(i) = A(i+2) + C(i) end do CS 524 (Au 05-06)- Asim Karim @ LUMS
Data Flow Dependence • Data written by some statement instance is later read by some statement instance, in the serial execution of the program. This is written as S1δf S2. do i = 3, 50 S1: A(i+1) = ... S2: ... = A(i-2) ... end do S1 S2 CS 524 (Au 05-06)- Asim Karim @ LUMS
Data Anti-Dependence • Data read by some statement instance is later written by some statement instance, in the serial execution of the program. This is written as S1δa S2. do i = 1, 50 S1: A(i-1) = ... S2: ... = A(i+1) ... end do S1 S2 CS 524 (Au 05-06)- Asim Karim @ LUMS
Iteration Space Graph (1) j i • Nested loops define an iteration space: do i = 1, 4 do j = 1, 4 A(i,j) = B(i,j) + C(j) end do end do • Sequential execution (traversal order): CS 524 (Au 05-06)- Asim Karim @ LUMS
Iteration Space Graph (2) j i • Dimensionality of iteration space = loop nest level • Not restricted to be rectangular • Triangular iteration spaces are common in scientific codes do i = 1, 5 do j = i, 5 A(i,j) = B(i,j) + C(j) end do end do • Sequential execution (traversal order): CS 524 (Au 05-06)- Asim Karim @ LUMS
Sequential Execution CS 524 (Au 05-06)- Asim Karim @ LUMS • Ordering of execution • Given two iterations (i1, j1) and (i2, j2) (with positive loop steps): we say (i1, j1) PCD (i2, j2) if and only if either (i1 < i2) or [(i1 = i2) AND (ji < j2)]. • This rule can be extended to multi-dimensional iteration spaces. • A vector (d1, d2) is positive, if (0, 0) PCD (d1, d2) i.e., its first (leading) non-zero component is positive. • PCD = precedes = ordering symbol for vectors
Dependences in Loop Nests do i1 = L1, U1 do i2 = L2, U2 ... do in = Ln, Un BODY(i1, i2,..., in) end do end do ... end do • There is a dependence in a loop nest if there are iterations I = (i1, i2,...,in) and J = (j1, j2,...,jn) and some memory location M such that • I PCD J • BODY(I) and BODY(J) reference M • There is no intervening iteration K that accesses M,that is, I PCD K PCD J is not true CS 524 (Au 05-06)- Asim Karim @ LUMS
Distance and Direction Vectors • Assume a dependence from BODY(I = (i1, i2,...,in)) and BODY(J = (j1, j2,...,jn)). • The distance vectord = (j1 – i1, j2 – i2,…, jn – in) Define the sign function sgn(x1) of scalar x1: - if x1 < 0 sgn(x1) = 0 if x1 = 0 + if x1 > 0 • The direction vector = (sgn(d1), sgn(d2),…,sgn(dn) where dk = jk- ik for k = 1,…,n. CS 524 (Au 05-06)- Asim Karim @ LUMS
Example of Dependence Vectors do i = 1, N do j = 1, N A(i, j) = A(i, j-3) + A(i-2, j) + A(i-1, j+2) + A(i+1, j-1) end do end do CS 524 (Au 05-06)- Asim Karim @ LUMS
Validity of Loop Permutation (1) j i Before interchange do i = 1, N do j = 1, N ... end do end do After interchange do j = 1, N do i = 1, N ... end do end do (+, -) prevents interchange j (-+) i CS 524 (Au 05-06)- Asim Karim @ LUMS
Validity of Loop Permutation (2) • Loop permutation is valid if • all dependences are satisfied after interchange • Geometric view: source (of depended statements) still executed before sink • Algebraic view: permuted dependence direction vector is lexicographically non-negative CS 524 (Au 05-06)- Asim Karim @ LUMS
Iteration Space Blocking (Tiling) j i • A tile in an n-dimensional iteration space is an n-dimensional subset of the iteration space • A tile is defined by a set of boundaries regularly spaced apart • Each tile boundary is an (n – 1)-dimensional plane CS 524 (Au 05-06)- Asim Karim @ LUMS
Validity of Loop Blocking • Loop blocking is valid if • All dependences still satisfied in tiled execution order of iteration space graph (i.e. source before sink) • Geometric view: No two dependences cross any tile boundary in opposite directions • Algebraic view: Dot product of all dependence distance vectors with normal to tile boundary has same sign CS 524 (Au 05-06)- Asim Karim @ LUMS
Summary • Data dependency analysis is essential to avoid changing the meaning of the program when performance optimization transformations are done • Data dependency analysis is essential in design of parallel algorithms from sequential code • Data dependences must be maintained in all loop transformations; otherwise the transformation is illegal • Data dependency exist in a loop nest when dependence vector is lexicographically non-negative CS 524 (Au 05-06)- Asim Karim @ LUMS