550 likes | 571 Views
Control Dependences. Chapter 7. Control Dependences. Roadmap If-conversion Control dependence. Control Dependences. Constraints posed by control flow DO 100 I = 1, N S 1 IF (A(I-1).GT. 0.0) GO TO 100 S 2 A(I) = A(I) + B(I)*C 100 CONTINUE If we vectorize by...
E N D
Control Dependences Chapter 7
Control Dependences • Roadmap • If-conversion • Control dependence
Control Dependences • Constraints posed by control flow DO 100 I = 1, N S1 IF (A(I-1).GT. 0.0) GO TO 100 S2 A(I) = A(I) + B(I)*C 100 CONTINUE If we vectorize by... S2 A(1:N) = A(1:N) + B(1:N)*C DO 100 I = 1, N S1 IF (A(I-1).GT. 0.0) GO TO 100 100 CONTINUE …we get the wrong answer • We are missing dependences • There is a dependence from S1 to S2 - a control dependence S21 S1
Control Dependences • Two strategies to deal with control dependences: • If-conversion: expose by converting to data dependences. Used for vectorization • Explicitly expose as control dependences. Used for automatic parallelization
If-conversion • Underlying Idea: Convert statements affected by branches to conditionally executed statements DO 100 I = 1, N S1 IF (A(I-1).GT. 0.0) GO TO 100 S2 A(I) = A(I) + B(I)*C 100 CONTINUE can be converted to: DO I = 1, N IF (A(I-1).LE. 0.0) A(I) = A(I) + B(I)*C ENDDO
If-conversion DO 100 I = 1, N S1 IF (A(I-1).GT. 0.0) GO TO 100 S2 A(I) = A(I) + B(I) * C S3 B(I) = B(I) + A(I) 100 CONTINUE • can be converted to: DO 100 I = 1, N S2 IF (A(I-1).LE. 0.0) A(I) = A(I) + B(I) * C S3 IF (A(I-1).LE. 0.0) B(I) = B(I) + A(I) 100 CONTINUE • vectorize using the Fortran WHERE statement: DO 100 I = 1, N S2 IF (A(I-1).LE. 0.0) A(I) = A(I) + B(I) * C 100 CONTINUE S3 WHERE (A(0:N-1).LE. 0.0) B(1:N) = B(1:N) + A(1:N)
If-conversion • If-conversion assumes a target notation of guarded execution in which each statement implicitly contains a logical expression controlling its execution S1 IF (A(I-1).GT. 0.0) GO TO 100 S2 A(I) = A(I) + B(I)*C 100 CONTINUE • with guards in place: S1 M = A(I-1).GT. 0.0 S2 IF (.NOT. M) A(I) = A(I) + B(I)*C 100 CONTINUE
Branch Classification • Forward Branch: transfers control to a target that occurs lexically after the branch but at the same level of nesting • Backward Branch: transfers control to a statement occurring lexically before the branch but at the same level of nesting • Exit Branch: terminates one or more loops by transferring control to a target outside a loop nest
If-conversion • If-conversion is a composition of two different transformations: 1. Branch relocation 2. Branch removal
Branch removal • Basic idea: • Make a pass through the program. • Maintain a Boolean expression cc that represents the condition that must be true for the current expression to be executed • On encountering a branch, conjoin the controlling expression into cc • On encountering a target of a branch is encountered, its controlling expression is disjoined into cc
Branch Removal: Forward Branches • Remove forward branches by inserting appropriate guards DO 100 I = 1,N C1 IF (A(I).GT.10) GO TO 60 20 A(I) = A(I) + 10 C2 IF (B(I).GT.10) GO TO 80 40 B(I) = B(I) + 10 60 A(I) = B(I) + A(I) 80 B(I) = A(I) - 5 ENDDO • DO 100 I = 1,N • m1 = A(I).GT.10 • 20 IF(.NOT.m1) A(I) = A(I) + 10 • IF(.NOT.m1) m2 = B(I).GT.10 • 40 IF(.NOT.m1.AND..NOT.m2) B(I) = B(I) + 10 • 60 IF(.NOT.m1.AND..NOT.m2.OR.m1)A(I) = B(I) + A(I) • 80 IF(.NOT.m1.AND..NOT.m2.OR.m1.OR..NOT.m1 • .AND.m2) B(I) = A(I) - 5 • ENDDO
Branch Removal: Forward Branches • We can simplify to: DO 100 I = 1,N m1 = A(I).GT.10 20 IF(.NOT.m1) A(I) = A(I) + 10 IF(.NOT.m1) m2 = B(I).GT.10 40 IF(.NOT.m1.AND..NOT.m2) B(I) = B(I) + 10 60 IF(m1.OR..NOT.m2) A(I) = B(I) + A(I) 80 B(I) = A(I) - 5 ENDDO • vectorize to: m1(1:N) = A(1:N).GT.10 20 WHERE(.NOT.m1(1:N)) A(1:N) = A(1:N) + 10 WHERE(.NOT.m1(1:N)) m2(1:N) = B(1:N).GT.10 40 WHERE(.NOT.m1(1:N).AND..NOT.m2(1:N)) B(1:N) = B(1:N) + 10 60 WHERE(m1(1:N).OR..NOT.m2(1:N)) A(1:N) = B(1:N) + A(1:N) 80 B(1:N) = A(1:N) - 5
Branch Removal: Forward Branches • To show correctness we must establish: • the guard for statement instance in the new program is true if and only if the corresponding statement in the old program is executed, unless the statement has been introduced to capture a guard variable value, which must be executed at the point the conditional expression would have been evaluated • the order of execution of statements in the new program with true guards is the same as the order of execution of those statements in the original program • Any expression with side effects is evaluated exactly as many times in the new program as in the old program
Exit Branches DO J = 1, M DO I = 1, N A(I,J) = B(I,J) + X S IF (L(I,J)) GO TO 200 C(I,J) = A(I,J) + Y ENDDO D(J) = A(N,J) 200 F(J) = C(10,J) ENDDO • more complicated because they terminate a loop • Solution: relocate exit branches and convert them to forward branches
Exit Branches DO J = 1, M DO I = 1, N A(I,J) = B(I,J) + X S IF (L(I,J)) GO TO 200 C(I,J) = A(I,J) + Y ENDDO D(J) = A(N,J) 200 F(J) = C(10,J) ENDDO DO J = 1, M DO I = 1, N IF (C1) A(I,J) = B(I,J) + X Sa Code to set C1 and C2 IF (C2) C(I,J) = A(I,J) + Y ENDDO Sb IF (.NOT.C1.OR..NOT.C2) GO TO 200 D(J) = A(N,J) 200 F(J) = C(10,J) ENDDO • What should C1and C2 be?
Exit Branches • Statements in the inner loop should be executed only if exit branch was not taken on any previous iteration • For the ith iteration, C1 and C2 should be lm = AND( L(k, J) ), 1 k i-1 DO J = 1, M lm = .TRUE. DO I = 1, N IF (lm) A(I,J) = B(I,J) + X IF (lm) m1 = .NOT. L(I,J) lm = lm .AND. m1 IF (lm) C(I,J) = A(I,J) + Y ENDDO m2 = lm IF (m2) D(J) = A(N,J) 200 F(J) = C(10,J) ENDDO
Exit Branches • After forward substitution and expansion of lm, we get: DO J = 1, M lm(0,J) = .TRUE. DO I = 1, N IF (lm(I-1,J)) A(I,J) = B(I,J) + X IF (lm(I-1,J)) m1 = .NOT.L(I,J) lm(I,J) = lm(I-1,J) .AND. m1 IF (lm(I,J)) C(I,J) = A(I,J) + Y ENDDO IF (lm(N,J)) D(J) = A(N,J) 200 F(J) = C(10,J) ENDDO • codegen will produce four vectorized loops…
Exit Branches • After running codegen: DO J = 1, M lm(0,J) = .TRUE. DO I = 1, N IF (lm(I-1,J)) m1 =.NOT.L(I,J) lm(I,J) = lm(I-1,J) .AND. m1 ENDDO ENDDO WHERE(lm(0:N-1,1:M)) A(1:N,1:M)=B(1:N,1:M)+X WHERE(lm(0:N-1,1:M)) C(1:N,1:M)=A(1:N,1:M)+Y WHERE(lm(N,1:M)) D(1:M) = A(N,1:M) 200 F(1:M) = C(10,1:M) • Procedure relocate_branches()
Backward Branches • Problems: • Create implicit loops. Backward control flow cannot be simulated by simple guards • Complicate removal of forward branches - may create loops into which forward branches jump IF (P) GO TO 200 ... 100 S1 ... 200 S2 ... IF (Q) GO TO 100 • Applying forward if-conversion m1 = .NOT. P ... 100 IF (m1) S1 ... 200 S2 ... IF (Q) GO TO 100
Backward Branches • Solutions? • Avoid region within a backward control flow edge • Eliminate backward branches through a variant of if-conversion • Note that: • S1 is executed on the first pass through the code only if P is false • S1 is always executed when the backward branch is taken • Use a backward branch guard!
Backward Branches • Using a backward branch guard: IF (P) GO TO 200 ... 100 S1 ... 200 S2 ... IF (Q) GO TO 100 • converted to: m = P ... bb = .FALSE. 100 IF (.NOT.m .OR (m.AND.bb)) S1 ... 200 S2 ... IF (Q) THEN bb = .TRUE. GO TO 100 ENDIF
Backward Branches • In general, two ways a target of a backward branch can be reached: • Fall through • Branch around the statement but reach it via a backward branch • Thus, if current condition just prior to target y is cc, the branch condition is m, and the backward branch condition is bb, the guard at y should be: cc OR (m AND bb)
Complete Forward Branch Removal • Statement is branch target: combine (disjoin) set of conditions associated with branches to that target with the current condition passed from the lexical predecessor • Statement is any type except DO, ENDDO, CONTINUE: the current condition is conjoined to the guard for the current statement • Statement is a DO: invoke relocate_branches to remove exit branches. Recur on body of the loop. May generate some statements before the loop which should be guarded by the current condition
Complete Forward Branch Removal • Statement is a conditional branch: 2 copies of the current condition cc are made. • The compiler generated variable associated with the new condition is conjoined with cc and the result is appended to the list associated with the branch target • The negation of the variable is conjoined to cc and is the current condition for the next statement • Statement is an unconditional branch: current condition, cc, is appended to the list of conditions for the branch target. Current condition for the next statement is set to false • Continue processing at step 1 for next statement
Simplification • Boolean Simplifier is NP-Complete • Use Simplify, an O(N2) algorithm by tweaking simplification process to focus on if-conversion
Iterative Dependences • Iterative statements can also create control dependences: 20 DO I = 1, 100 40 L = 2*I 60 DO J= 1,L 80 A(I,J) = 0 ENDDO ENDDO • If we vectorize as: 20 DO I = 1, 100 40 L = 2*I 100 ENDDO 80 A(1:100,1:L) = 0 • Incorrect! • Must capture the notion that the DO statement controls the number of times a particular statement is executed.
Iterative Dependences • Notation used: • A(I, J) (irange) • where irange is a compiler generated scalar which holds the iteration range • Using this notation, the example will be converted to: 20 irange1 = (1,100) DO I = irange1 40 L = 2*I (irange1) 60 irange2 = (1,L) (irange1) DO J = irange2 80 A(I,J) = 0 (irange2) ENDDO ENDDO
Iterative Dependences • Forward substituting constants and loop-independent variables: 20 DO I = 1,100 40 L = 2*I (1,100) 60 DO J = 1,L (1,100) 80 A(I,J) = 0 (1,L) (1,100) ENDDO ENDDO • which vectorizes to: 20 DO I = 1, 100 40 L = 2*I 80 A(I,1:L) = 0 ENDDO
If-reconstruction • If-conversion may degrade performance when vectorization is not possible DO 100 I = 1, N IF (A(I) .GT. 0) GOTO 100 B(I) = A(I) * 2.0 A(I+1) = B(I) + 1 100 CONTINUE • After if-conversion: DO 100 I = 1, N m1 = (A(I) .GT. 0) IF (.NOT. m1) B(I) = A(I) * 2.0 IF (.NOT. m1) A(I+1) = B(I) + 1 100 CONTINUE
If-reconstruction • On a machine without predicated execution: DO 100 I = 1, N m1 = (A(I) .GT. 0) IF ( m1) GOTO 10 B(I) = A(I) * 2.0 10 IF (m1) GOTO 20 A(I+1) = B(I) + 1 20 CONTINUE 100 CONTINUE • Overheads! • If-reconstruction: replace sections of guarded code with a minimal set of branches that enforce the guarded execution
Control Dependence • Disadvantages of if-conversion: • Unnecessarily complicates code when code cannot be vectorized • Cannot a priorianalyze code to decide whether if-conversion will lead to parallel code. • Alternate approach: explicitly expose constraints due to control flow as control dependences
Control Dependence • A node x in directed graph G with a single exit node postdominates node y in G if any path from y to the exit node of G must pass through x. • A statement y is said to be control dependent on another statement x if: • there exists a non-trivial path from x to y such that every statement zx in the path is postdominated by y and • x is not postdominated by y. • In other words, a control dependence exists from S1 to S2 if one branch out of S1 forces execution of S2 and another doesn’t • Note that control dependences can be looked as a property of basic blocks
Control Dependence: Example • n nodes and O(n2) control dependences. • Control dependence graphs can thus get much larger than the corresponding CFG • procedure ConstructCD constructs the control dependence relation
Control Dependence: Loops • Loops can be converted to a CFG and then ConstructCD can be applied • Want to treat loops as special cases to help in transforming loops • Use a loop control node to represent the loop 10 DO I = 1, 100 20 A(I) = A(I) + B(I) 30 IF (A(I).GT.0) GO TO 50 40 A(I) = -A(I) 50 B(I) = A(I) + C(I) ENDDO
Execution Model • In Chapter 2, we annotated each statement S with the corresponding iteration vector i • S(i) could execute whenever every statement instance that it depended on had already executed • However… DO I = 1, N S0 IF (P) GO TO S2 S1 ... S2 ... ENDDO
Execution Model • Solution: Use a doit flag for each statement: S(i).doit • Statement instances that are not control dependent on any other statement: doit = True • For all other statements: doit = False • How does doit get set to True? • All those statements that are control dependent on the conditional and whose execution is forced by the sense of the condition: doit = true • Execute statement instance S(i) if its doit flag is set to True and every statement instance it depends on either has a false doit flag or has been executed
Execution Model • Note that if doit is true for S, then there is a sequence of control statements S0, S1, ... , Sm= S such that S0 is executed unconditionally and the decision taken at Sk forces the execution of Sk+1, 0 k < m • Sequence of control dependences defines a unique execution path
Execution Model • Behavior of loop control nodes under this model: • Case 1: Evaluation of iteration range does not depend on quantities computed in loop: • Set doit for loop node to True • Range of iteration can be completely evaluated • Create collection of statement instances for the loop body, one for each iteration of the loop • Set doit flags of statements control dependent on loop header to true, all other doit flags to False
Execution Model • Case 2: Evaluation of iteration range depends on quantities computed in loop: • If range is non-empty, create new instance of loop header, adjusting range to the remainder of the iterations • DO.doit = True if dependence back to DO is a data dependence and False if it is a control dependence • Set doit flags of statements control dependent on loop header to true, all other doit flags to False
Execution Model Theorem 7.1. Dependence graphs that are executed according to the execution model are equivalent in meaning to the programs from which they are created. • Proof: • Show that doit flag of statement is true iff it is executed in the original program • Proof by contradiction: Consider the shortest sequence S0, S1, …,Sm-1, Sm s.t. Sm is the first statement to get the wrong doit flag • Focus on Sm-1: • All statements executed leading to Sm-1 in the original program must be executed in this model • Statements that are not executed leading to Sm-1 in the original program cannot be executed in this model
Control Dependence and Parallelization • For simplicity, we shall only consider: • Forward branches - they create loop-independent control dependences • Control Dependences due to loops • From Chapter 2: Most loop transformations are unaffected by loop-independent dependences • Loop reversal, loop skewing, strip mining, index-set splitting, loop interchange do not affect independent dependences • Might be problematic: Loop fusion, loop distribution • However, since exit branches are excluded, loop fusion is not a problem
Loop Distribution DO I = 1, N S1 IF (A(I).LT.B(I)) GOTO 20 S2 B(I) = B(I) + C(I) S1 -1 S2 20 CONTINUE ENDDO • Distributing… DO I = 1, N S1 IF (A(I).LT.B(I)) GOTO 20 ENDDO DO I = 1, N S2 B(I) = B(I) + C(I) ENDDO 20 CONTINUE • Incorrect!
Loop Distribution • Problem: control dependences crossing between distributed loops • Solution: Keep a history of the evaluated conditions (similar to if-conversion). DO I = 1, N S1 IF (A(I).LT.B(I)) GOTO 20 S2 B(I) = B(I) + C(I) 20 CONTINUE ENDDO • Convert to: DO I = 1, N S1 e(I) = A(I).LT.B(I) ENDDO DO I = 1, N S2 IF (e(I).EQ..FALSE.) B(I) = B(I) + C(I) ENDDO
Loop Distribution • More complex example: DO I = 1, N 1 IF (A(I).NE.0) THEN 2 IF (B(I)/A(I).GT.1) GOTO 4 ENDIF 3 A(I) = B(I) GOTO 8 4 IF (A(I).GT.T) THEN 5 T = (B(I) - A(I)) + T ELSE 6 T = (T + B(I)) – A(I) 7 B(I) = A(I) ENDIF 8 C(I) = B(I) + C(I) ENDDO
Loop Distribution • Fusion into "like" regions • Needs two execution variables E2(I) and E4(I) to hold result of branches at statement 2 and 4 respectively
Loop Distribution • Consider branch at node 2: • 3 cases may hold • Statement 2 is executed and the true branch to statement 4 is taken • Statement 2 is executed and the false branch to statement 3 is taken • Statement 2 is never executed because the false branch is taken at statement 1 • Corresponds to condition for doitvariable to be set: • A control dependence exists from S0 to S. • S0 has its doit flag set • Value of the conditional expression is the label on the branch
Loop Distribution • Use three corresponding values: True, False, Undefined • procedure DistributeCDG implements these ideas. It inserts execution variables at appropriate places in the code and selectively converts control dependences to data dependences
Code Generation • Problem: Mapping the arbitrary control flow represented in the control dependence graph to real machines DO I = 1, N S1 IF (p1) GOTO 3 S2 ... GOTO 4 3 IF (p3) GOTO 5 4 S4 5 S5 ENDDO Loop distribution
Code Generation • Code generated for first partition: DO I = 1, N E1(I) = p1 IF (E1(I).EQ.FALSE) THEN S2 ... ENDIF S5 ... ENDDO • For second partition: DO I = 1, N IF((E1(I).EQ..TRUE.).AND..NOT.p3).OR. (E1(I).EQ..FALSE.)) THEN S4 ... ENDIF ENDDO