CS 201 Compiler Construction

CS 201Compiler Construction Lecture 2 Control Flow Analysis

What is a loop ? A subgraph of CFG with the following properties: • Strongly Connected: there is a path from any node in the loop to any other node in the loop; and • Single Entry: there is a single entry into the loop from outside the loop. The entry node of the loop is called the loop header. Loop nodes: 2, 3, 5 Header node: 2 Loop back edge: 52 TailHead

Property Given two loops: they are either disjoint or one is completely nested within the other. 0 1 2 3 4 5555 5 6 Loops {1,2,4} and {5,6} are Disjoint. Loop {5,6} is nested within loop {2,4,5,6}. Loop {5,6} is nested within loop {1,2,3,4,5,6}.

Identifying Loops Definitions: Dominates: node n dominates node m iff all paths from start node to node m pass through node n, i.e. to visit node m we must first visit node n. A loop has • A single entry  the entry node dominates all nodes in the loop; and • A back edge, and edge AB such that B dominates A. B is the head & A is the tail.

Identifying Loops Algorithm for finding loops: • Compute Dominator Information. • Identify Back Edges. • Construct Loops corresponding to Back Edges.

Dominators: Characteristics • Every node dominates itself. • Start node dominates every node in the flow graph. • If N DOM M and M DOM R then N DOM R. • If N DOM M and O DOM M then either N DOM O or O DOM N • Set of dominators of a given node can be linearly ordered according to dominator relationships.

Dominators: Characteristics 1 is the immediate dominator of 2, 3 & 4 CFG Dominator Tree 6. Dominator information can be represented by a Dominator Tree. Edges in the dominator tree represent immediate dominator relationships.

Computing Dominator Sets Let D(n) = set of dominators of n Where Pred(n) is set of immediate predecessors of n in the CFG Observation: node m donimates node n iff m dominates all predecessors of n.

Computing Dominator Sets Algorithm: Initial Approximation: D(no) = {no} no is the start node. D(n) = N, for all n!=no N is set of all nodes. Iteratively Refine D(n)’s:

Example: Computing Dom. Sets D(1) = {1} D(2) = {2} U D(1) = {1,2} D(3) = {3} U D(1) = {1,3} D(4) = {4} U (D(2) D(3) D(9)) = {1,4} D(5) = {5} U (D(4) D(10)) = {1,4,5} D(6) = {6} U (D(5) D(7)) = {1,4,5,6} D(7) = {7} U D(5) = {1,4,5,7} D(8) = {8} U (D(6) D(10)) = {1,4,5,6,8} D(9) = {9} U D(8) = {1,4,5,6,8,9} D(10)= {10} U D(8) = {1,4,5,6,8,10} Back Edges: 94, 108, 105

Loop • 1 dominates 6 • 61 is a back edge • Loop of 61 • = {1} + {3,4,5,6} • = {1,3,4,5,6} Given a back edge N  D Loop corresponding to edge N  D = {D} + {X st X can reach N without going through D}

Algorithm for Loop Construction Stack = empty Loop = {D} Insert(N) While stack not empty do pop m – top element of stack for each p in pred(m) do Insert(p) endfor Endwhile Insert(m) if m not in Loop then Loop = Loop U {m} push m onto Stack endif End Insert Given a Back Edge ND

Example Loop = {2} + {7} + {6} + {4} + {5} + {3} Stack = 7 6 4 5 3 D N Back Edge 72

Examples While A do S1 While B do S2 Endwhile Endwhile L2  B, S2 L1  A,S1,B,S2 L2 nested in L1 L1  S1,S2,S3,S4 L2  S2,S3,S4 L2 nested in L1 ?

Reducible Flow Graph The edges of a reducible flow graph can be partitioned into two disjoint sets: • Forward – from an acyclic graph in which every node can be reached from the initial node. • Back – edges whose heads (sink) dominate tails (source). Any flow graph that cannot be partitioned as above is a non-reducible or irreducible.

Reducible Flow Graph Irreducible Node Splitting Converts irreducible to reducible 23 not a back edge 32 not a back edge graph is not acyclic How to check reducibility ? • Remove all back edges and see if the resulting graph is acyclic. Reducible

Loop Detection in Reducible Graphs Forward edge MN (M is descendant of N in DFST) Depth-first Ordering -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Back edge MN (N is ancestor of M in DFST) Depth-first Ordering: numbering of nodes in the reverse order in which they were last visited during depth first search. MN is a back edge iff DFN(M) >= DFN(N)

Example CFG DFST 1 2 3 4 6 7 8764 3 5321 Forward edge Depth First Ordering 1 2 3 5 4 6 7 8 Back edge (Reverse of post-order traversal)

Algorithm for DFN Computation DFS(X) { mark X as “visited” for each successor S of X do if S is “unvisited” then add edge XS to DFST call DFS(S) endif endfor DFN[X] = I; I = I – 1; } Mark all nodes as “unvisited” DFST = {} // set of edges of DFST I = # of nodes in the graph; DFS(no);

Sample Problems Control Flow Analysis

1. For the given control flow graph: • Compute the dominator sets and construct the dominator tree; • Identify the loops using the dominator information; and • (c) Is this control flow graph reducible? If it is so, covert it into a reducible graph. 1 Dominators 2 4 3 5 6 7 8

1 • 2. For the given reducible control flow graph: • Compute the depth first numbering; and • Identify the loops using the computed information. Depth First Numbering 2 3 4 5 6 7 8 9

CS 201Compiler Construction Lecture 3 Data Flow Analysis

Data Flow Analysis Data flow analysis is used to collect information about the flow of data values across basic blocks. • Dominator analysis collected global information regarding the program’s structure • For performing global code optimizations global information must be collected regarding values of program variables. • Local optimizations involve statements from same basic block • Global optimizations involve statements from different basic blocks  data flow analysis is performed to collect global information that drives global optimizations

Local and Global Optimization

Applications of Data Flow Analysis Applicability of code optimizations Symbolic debugging of code Static error checking Type inference …….

Applications of Data Flow Analysis • Definition • How to compute • Application Reaching Definition Available Expression Live Variables Very Busy Expression

1. Reaching Definitions Definition d of variable v: a statement d that assigns a value to v. (d: v = 1;) Use of variable v: reference to value of v in an expression evaluation. (u: … = v+2;) Definition d of variable v reaches a point p if there exists a path from immediately after d to p such that definition d is not killed along the path. Definition d is killed along a path between two points if there exists an assignment to variable v along the path.

Example d reaches u along path2 & d does not reach u along path1 Since there exists a path from d to u along which d is not killed (i.e., path2), d reaches u.

Reaching Definitions Contd. X=.. *p=.. Does definition of X reach here ? Yes Unambiguous Definition: X = ….; Ambiguous Definition: *p = ….; p may point to X For computing reaching definitions, typically we only consider kills by unambiguous definitions.

Computing Reaching Definitions d2: X=… d3: X=… IN[B] GEN[B] ={d1} d1: X=… KILL[B]={d2,d3} OUT[B] At each program point p, we compute the set of definitions that reach point p. Reaching definitions are computed by solving a system of equations (data flow equations).

Data Flow Equations IN[B]: Definitions that reach B’s entry. OUT[B]: Definitions that reach B’s exit. GEN[B]: Definitions within B that reach the end of B. KILL[B]: Definitions that never reach the end of B due to redefinitions of variables in B.

Reaching Definitions Contd. • Forward problem – information flows forward in the direction of edges. • May problem – there is a path along which definition reaches a point but it does not always reach the point. Therefore in a May problem the meet operator is the Union operator.

Applications of Reaching Definitions Constant Propagation/folding Copy Propagation

2. Available Expressions An expression is generated at a point if it is computed at that point. An expression is killed by redefinitions of operands of the expression. An expression A+B is available at a point if every path from the start node to the point evaluates A+B and after the last evaluation of A+B on each path there is no redefinition of either A or B (i.e., A+B is not killed).

Available Expressions Available expressions problem computes: at each program point the set of expressions available at that point.

Data Flow Equations IN[B]: Expressions available at B’s entry. OUT[B]: Expressions available at B’s exit. GEN[B]: Expressions computed within B that are available at the end of B. KILL[B]: Expressions whose operands are redefined in B.

Available Expressions Contd. • Forward problem – information flows forward in the direction of edges. • Must problem – expression is definitely available at a point along all paths. Therefore in a Must problem the meet operator is the Intersection operator. • Application: A

3. Live Variable Analysis Live Variable Analysis Computes: At each program point p identify the set of variables that are live at p. A path is X-clear if it contains no definition of X. A variable X is live at point p if there exists a X-clear path from p to a use of X; otherwise X is dead at p.

Data Flow Equations IN[B]: Variables live at B’s entry. OUT[B]: Variables live at B’s exit. GEN[B]: Variables that are used in B prior to their definition in B. KILL[B]: Variables definitely assigned value in B before any use of that variable in B.

Live Variables Contd. • Backward problem – information flows backward in reverse of the direction of edges. • May problem – there exists a path along which a use is encountered. Therefore in a May problem the meet operator is the Union operator.

Applications of Live Variables Register Allocation Dead Code Elimination Code Motion Out of Loops

4. Very Busy Expressions Application: Code Size Reduction Compute for each program point the set of very busy expressions at the point. A expression A+B is very busy at point p if for all paths starting at p and ending at the end of the program, an evaluation of A+B appears before any definition of A or B.

Data Flow Equations IN[B]: Expressions very busy at B’s entry. OUT[B]: Expressions very busy at B’s exit. GEN[B]: Expression computed in B and variables used in the expression are not redefined in B prior to expression’s evaluation in B. KILL[B]: Expressions that use variables that are redefined in B.

Very Busy Expressions Contd. • Backward problem – information flows backward in reverse of the direction of edges. • Must problem – expressions must be computed along all paths. Therefore in a Must problem the meet operator is the Intersection operator.

Summary

Conservative Analysis Optimizations that we apply must be Safe => the data flow facts we compute should definitely be true (not simply possibly true). Two main reasons that cause results of analysis to be conservative: 1. Control Flow 2. Pointers & Aliasing

Conservative Analysis X+Y is always available if we exclude infeasible paths. 1. Control Flow – we assume that all paths are executable; however, some may be infeasible.

Conservative Analysis 2. Pointers & Aliasing – we may not know what a pointer points to. 1. X = 5 2. *p = … // p may or may not point to X 3. … = X Constant propagation: assume p does point to X (i.e., in statement 3, X cannot be replaced by 5). Dead Code Elimination: assume p does not point to X (i.e., statement 1 cannot be deleted).

Representation of Data Flow Sets • Bit vectors – used to represent sets because we are computing binary information. • Does a definition reach a point ? T or F • Is an expression available/very busy ? T or F • Is a variable live ? T or F • For each expression, variable, definition we have one bit – intersection and union operations can be implemented using bitwise and & or operations.

CS 201 Compiler Construction