250 likes | 710 Views
Languages and Compiler Design II Basic Blocks. Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring 2010 rev.: 5/18/2010. Agenda. Definition Sample: Basic Block Identifying Basic Blocks (BB) Control Flow Graph (CFG) Sample: Quicksort Quicksort CFG
E N D
Languages and Compiler Design IIBasic Blocks Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring 2010 rev.: 5/18/2010 CS322
Agenda • Definition • Sample: Basic Block • Identifying Basic Blocks (BB) • Control Flow Graph (CFG) • Sample: Quicksort • Quicksort CFG • Loops • CFG Synthesis CS322
Definition: Basic Block A Basic Block (BB) is a sequence of 1 or more consecutive instructions, starting with a unique entry (header, aka leader) and ending with an exit instruction that transfers to another BB or ends the program (e.g. a Halt instruction) Possible to have single-instruction BBs. Leaders are explicitly created by being the destination of branch- and call destinations. Leaders are implicitly created by the previous instruction branching away (via jump or call), or by fall-though e.g. in the case of conditional branches CS322
Sample: Basic Block A basic block is a sequence of 1 or more consecutive operations whose first is the sole entry and whose last is the sole exit point. • Only the first statement can be a label or target of a jump. But being the first operation of a BB via fall-though is also possible • Only the last statement can be a jump statement. But non-control-flow operations can also be exits points, for example, when the next one happens to be target of a branch. (1)-(4) form Basic Block (0) is Basic Block Multiple Basic Blocks (0) L1: (0) L3: goto foo (0) i := m-1 • i := m-1 (1) j := n • j := n (2) L4: (3) t1 := 4*n (3) t2 := 4 * i • v := a[t1] (4) goto bar (5) L2: ... (5) t3 := t3 -j CS322
Identifying Basic Blocks 1.) Identify “leaders”, i.e. the first statements of basic blocks. Leaders are: • The first statement of the program; e.g. first instruction of main() function • The target of a call, conditional, or unconditional branch • Operation following a control-transfer instruction; this operation is an implicit target by fall-through; note that successor of unconditional branch is candidate for unreachable code 2.) For each leader: its basic block consists of the leader itself plus all 0 or more operations up to and excluding the next leader or up to the halt instruction Example: Leaders: Basic Blocks: L0:(1) a := 0 (1) (1) L1:(2) b := b+1 (2) (2) (3) (4) (5) (3) c := c+b (4) a := b*2 (5) if a<N goto L1 (6) return c (6) (6) CS322
Control Flow Graph (CFG) A program’s Control Flow Graph is a directed graph, whose nodes are Basic Blocks, and whose vertices are program-defined flows of control from Basic Blocks to others Example (1) a := 0 a := 0 BB1 L1: (2) b := b+1 b := b+1 BB2 (3) c := c+b c := c+b (4) a := b*2 a := b*2 (5) if a<N goto L1 id a<N goto L1 (6) return c return c BB3 CS322
Sample: Quicksort // assume an external input-output array: int a[] void quicksort( int m, int n ) { int i, j, v, x; // temps if ( n <= m ) return; i = m-1; j = n; v = a[n]; while(1) { do i=i+1; while( a[i] < v ); do j=j-1; while( a[j] > v ); if ( i >= j ) break; x = a[i]; a[i] = a[j]; a[j] = x; } //end while x = a[i]; a[i] =a [n]; a[n] = x; quicksort( m, j ); quicksort( i+1, n ); } //end quicksort CS322
Quicksort IR Code (16) t7 := 4*i (17) t8 := 4*j (18) t9 := a[t8] (19) a[t7] := t9 (20) t10 := 4*j (21) a[t10] := x (22) goto L0 L3: (23) t11 := 4*i (24) x := a[t11] (25) t12 := 4*i (26) t13 := 4*j //Jingke (27) t14 := a[t13] (28) a[t12] := t14 (29) t15 := 4*j //Jingke (30) a[t15] := x (31) 2 calls ... (1) i := m-1 (2) j := n (3) t1 := 4*n (4) v := a[t1] L0: L1: (5) i := i+1 (6) t2 := 4*i (7) t3 := a[t2] (8) if t3<v goto L1 L2: (9) j := j-1 (10) t4 := 4*j (11) t5 := a[t4] (12) if t5>v goto L2 (13) if i>=j goto L3 (14) t6 := 4*i (15) x := a[t6] CS322
Quicksort CFG BB1 i := m-1 j := n t1 := 4*n v := a[t1] Control Flow Graph BB1: (1)--(4) BB2: (5)--(8) BB3: (9)--(12) BB4: (13) BB5: (14)--(22) BB6: (23)--(30) BB2 i := i+1 t2 := 4*i t3 := a[t2] if t3<v goto BB2 BB3 j := j-1 t4 := 4*j t5 := a[t4] if t5 > v goto BB3 BB4 if i >= j goto BB6 BB5 BB6 t6 := 4 * i x := a[t6] t7 := 4*i t8 := 4*j t9 := a[t8] a[t7]:= t9 t10 := 4*j a[t10]:= x goto BB2 t11 := 4*i x := a[t11] t12 := 4*i t13 := 4*j t14 := a[t13] a[t12]:= t14 t15 := 4*j a[t15]:= x CS322
Loops • Since cfg is a graph, it may contain loops, AKA strongly-connected-components (SCC) • Generally, a loop is a directed graph, whose nodes can reach all other nodes along some path • This includes “unstructured” loops, with multiple entry and multiple exit points • A structured loop (proper loop) has just 1 entry point, and (generally) a single point of exit • Loops created by mapping high-level source programs to IR or assembly code are proper, unless disturbed by Goto (and Break) statements • Goto can create any loop; break creates additional exits CS322
Loops, Cont’d Unstructured 2 proper loops, one unstructured loop Loop: 2, 3, 4, 5 Loop1: 2, 3; Loop2: 2, 4; Loop3: 2, 3, 4 1 1 2 2 How many loops? 3 3 4 4 5 5 6 CS322
Natural Loops • Given a “back edge” t -> h, the natural loop of t -> h is the subgraph consisting of the set of nodes containing h and all the nodes that (1) are dominated by h and (2) from which t can be reached without passing through h, and the edge set connecting all the nodes in this node set • Node h is the loop header, which is the unique entry node to the loop • Dominance Relation: A node d dominates node i, if every execution path from CFG entry to i includes d, i.e. one can’t execute i without executing d first • Recursive Dominance Definition: a dom b: Meaning node a dominates node b if and only if • a = b, or • a is the unique immediate predecessor of b, or • a dominates all the immediate predecessors of b CS322
Back Edges We call an edge t -> h back edge, if h dominates t Finding Back Edges: • Find a spanning tree of the CFG, e.g. using a depth-first search algorithm • Edges that are not included in the spanning tree are candidates for back edges, check each against the dominance relation CS322
BB Analysis See separate .doc presentation CS322
Well-Structured CFG CFG is well-structured (AKA reducible) iff all its loops are natural loops characterized by their back edges. Important Properties: • In a well-structured control-flow graph there are no jumps into the middles of loops. I.e. each loop is entered only through its header • A cfg derived from programs using structured flow-of-control statements exclusively such as if-then-else, while-do, continue, and break statements are always well-structured Many dataflow analysis algorithms work only on well-structured CFGs. Example: Simplest irreducible flow graph: 1 2 3 CS322
CFG Synthesis Definition: A Control Flow Graph (cfg) of some program p , named cfg(p), is a static abstraction of p, in which each node represents a Basic Block (BB). Edges connecting the nodes in cfg represent the control flow from any one basic block to its successors. A cfg only represents the static control flow, hence it is not necessary to store, which of 2 successors in an If Expression (the Then Clause and the Else Clause) is connected by the true condition. Only that there are 2 successors matters. CS322
CFG Synthesis The cfg Algorithm cfg_build(pc): • Aside from its parameter pc, input to the cfg Algorithm cfg_build() is a list of instructions I broken into Basic Blocks. One of these BBs holds the select entry instruction at address: pc • The cfg Algorithm creates a new cfg node for each BB • For each successor s of a Basic Block BB(n) cfg_build() installs a directed edge from BB(n) to s. During this process each BB is labeled as reached. At completion, all BBs are inspected; those not reached are filtered out as unreachable BBs, hence each of its instructions is unreachable code. Output of cfg_build() is a pointer to the cfg node associated with address pc CS322
CFG Synthesis See separate .doc presentation CS322