170 likes | 382 Views
Basic Loop Parallelization. Taewook Eom RTOS Lab., School of EECS, SNU. Outline. Data dependences Loop parallelization How to extract data dependence information. Loop Parallelization Goal. DoAll loops Loops whose iterations can execute in parallel Example For (i = 0; i < n; i++)
E N D
Basic Loop Parallelization Taewook Eom RTOS Lab., School of EECS, SNU
Outline • Data dependences • Loop parallelization • How to extract data dependence information
Loop Parallelization Goal • DoAll loops • Loops whose iterations can execute in parallel • Example For (i = 0; i < n; i++) A[i] = B[i] + C[i] • Statically assign each iteration to a processor
Data Dependence of Scalars • Data Dependences • True dependence: write read • Anti dependence: read write • Output dependence: write write • Data dependence exists from a dynamic instance i to i’ iff • either i or i’ is a write operation • i and i’ refer to the same variable • i executes before i’
Data Dependences for Arrays (1) • DoAll loop vs. DoAcross loop • Loop-carried vs. Loop-independent dependence • Dependence exists across iterations vs. within one iteration • Recognize DoAll loops (intuitively) • Find data dependences in loops • No dependences crossing iteration boundaries⇒ Parallelizable for (i = 0; i < 3; i++)A[i] = A[i] + 1 for (i = 2; i < 5; i++)A[i] = A[i-2] + 1
Data Dependences for Arrays (2) • Assumptions • A loop is in canonical form • Index runs from 1 to N by increments of 1 • Perfectly nested loops • Only the innermost loop has statements other than for statements • Example For (i1 = 1; i1 <= n1; i1++) For (i2 = 1; i2 <= n2; i2++) For (i3 = 1; i3 <= n3; i3++) A[i1, i2, i3] = A[i1, i2, i3] + 1;
Iteration Space (1) • Iteration is represented as coordinates in iteration space • n-dimensional discrete space for n-deep loops • Iteration space diagrams show dependences between all iterations of a loop • Vs. Dependence graph shows all dependences between two statements in a loop • Example Iteration Space Traversal FOR (i = 0; i < 5; i++) FOR (j = 0; j < 7; j++) A[i, j] = A[i-1, j-1] + 1 i j
Iteration Space (2) • Sequential execution order of iteration • Lexicographic order • [0,0], [0,1], … , [0,6], [1,1], … , [1,6], [2,2], … • Iteration is lexicographically less than , iff • Examples • (1,6,0) < (2, 0, 0) • (0,3,1) < (0,5,0)
Distance Vectors (1) • A loop has a distance vector • If there exists data dependence from a node to a later node and • Indicates how many iterations apart are source and sink of dependence • Example • Distance vector = [1,1] Iteration Space FOR (i = 0; i < 5; i++) FOR (j = 0; j < 7; j++) A[i, j] = A[i-1, j-1] + 1 i j
Distance Vectors (2) • Since then • is lexicographicallygreater than or equal to 0 • All dependence must point forward in time • Distance vector (infinitely large set) • [0,0], [0,1], … , [0,n], [1,-n], … , [1,0], … , [1,n], … , [n,-n], … , [n,0], … , [n,n] • Summary: Direction vectors • 0 or lexicographically positive • [0,0], [0,+], [+,-], [+,0], [+,+]
Direction Vectors (1) • The sign of the distance • There are different notations • < or +: dependence from earlier to later iteration • = or 0: dependence within the same iteration • > or -: dependence from later to earlier iteration • Example • i = (1, 4) • i’ = (3, 1) • d = i’ - i = (2, -3) • Direction = (<, >) or (+, -) For (i = 0; i < n; i++) For (j = 0; j < n; j++) A[i, j] = A[i-2, j+3]
Direction Vectors (2) • What can the direction vectors for 3-D loops be like? [? ? ?] [- ? ?] [+ ? ?] [0 ? ?] … [0 - ?] [0 + ?] [0 0 ?] … Backward direction [0 0 -] [0 0 +] [0 0 0]
Dependence Vectors • Dependence Vector • Set of distance vectors • Example For (I = 0; i < n; i++) For (j = 0; j < n; j++) { A[i][j] = A[i][j] + 1; B[i][j] = B[i][j-1] + 1; } • Test for parallelization: • A loop with data dependence is parallelizable if for all = (0) Distance vectors : [0, 0], [0, 1] Dependence vectors : [0,[0, 1]]
n-Dimensional Loops FOR (i = 0; i < n; i++) FOR (j = 0; j < n; j++) A[i, j] = A[i, j-1] + 1 FOR (i = 0; i < n; i++) FOR (j = 0; j < n; j++) A[i, j] = A[i-1, j] + 1 FOR (i = 0; i < n; i++) FOR (j = 0; j < n; j++) A[i, j] = A[i-1, j-1] + 1 Loop i is parallelizable i j Loop j is parallelizable i j Loop j is parallelizable i j
Test for Parallelization • is loop-carried at level iif di is the first non-zero element ⇒ does not affect the parallelizability of inner loop • Ex) (0,0,2,0,1, …) : level 3 dependence • The i th loop of an n-dimensional loop is parallelizable if there does not exist any level i dependences • The i th loop of an n-dimensional loop is parallelizable for all dependences , either • (d1, … ,di-1) > 0 (= lexicographically forward) • (d1, … ,di) = 0
Improving Parallelizability • Scalar Privatization • Loop-carried dependence is an anti dependence • No value is really carried across the loop • Data is local to each loop iteration • Eliminating loop carried dependences • Scalar privatization should be applied before any serious parallelization attempt float t; for (i = 0; i < n; i++) { t = A[i]; b[i] = t * t; } for (i = 0; i < n; i++) { float t; t = A[i]; b[i] = t * t; } Anti dependence (loop-carried) True dependence (loop-indep.) Not parallelizable!! Parallelizable!!
Summary • Data Dependence • read/write • same variable • in direction of sequential ordering • Representation • Iteration space • Dependence vectors • Distance vectors • Direction vectors • Parallelization Testing