Code optimization by partial redundancy elimination using Eliminatability paths (E-paths)

Code optimization bypartial redundancy eliminationusing Eliminatability paths (E-paths) Prof. Dhananjay M Dhamdhere

These slides are based on • D. M. Dhamdhere: “E-path_PRE---Partial redundancy elimination made easy”, SIGPLAN Notices, v 37, n 8 (2002), 53-65. • D. M. Dhamdhere: “Eliminatability path---A versatile basis for partial redundancy elimination, 2002 • Dheeraj Kumar: “Syntactic and Semantic Partial Redundancy elimination”, M. Tech. dissertation, I.I.T. Bombay, 2006.

Partial redundancy elimination • Partial redundancy An expression e in statement s is partially redundant if its value is identical with value of e in some path from start of program to s • Partial redundancy elimination -- A partially redundant occurrence of e is made totally redundant by inserting evaluations of e in some path(s) from start of the program to s -- The totally redundant occurrence of e is now eliminated

An example of PRE t=a*b a*b t=a*b 2 1 2 1 a*b t 3 3 -- Insert a*b in node 2 -- Delete a*b from node 3

Partial redundancy elimination PRE subsumes 3 important classical optimizations: • Common subexpression elimination (CSE) - Expression e is computed along all paths reaching its occurrence • Loop invariant movement - A loop-invariant expression is available along the looping edge. Hence it is partially redundant. • Classical code motion - A less known optimization. It is in fact partial redundancy elimination in specific situations.

PRE subsumes 3 optimizations 1. CSE 1 - a*b of node 5 is a CSE. 2. Loop invariant movement a=.. a*b 2 4 • a*b of node 4 is partially • redundant a*b 3 5 3. Code movement • a*b of node 6 can be • moved to node 3. 6 a*b

Benefits and costs of PRE • Benefits: Execution efficiency through a reduction in the number of expression occurrences along a graph path • Costs: - Use of compiler generated temporaries to hold values of expressions - Lifetimes of compiler generated temporaries increase register pressure - Insertion of new blocks due to edge placement • Desirable goals: Computational optimality and lifetime optimality

Data flow concepts used in partial redundancy elimination • Availability : An expression e is available at a program point p if its value is computed along ALL paths from start of the program to p • Partial availability : An expression e is partially available at a program point p if its value is computed along SOME path from start of the program to p • Availability = Total redundancy • Partial availability = Partial redundancy

Data flow concepts used in partial redundancy elimination • Anticipatability: An expression e is anticipatable (that is, “very busy”) at a program point p if it is computed along ALL paths from p to an exit of the program

Data flow concepts used in partial redundancy elimination • Anticipatability: An expression e is anticipatable (that is, “very busy”) at a program point p if it is computed along ALL paths from p to an exit of the program • Safety of a computation (Kennedy 1972): An expression e is safe at a program point p if it is either available or anticipatable at p - Insertion of e at p is a “new” computation if e is not safe at p. - It increases the execution time of the program. It may also raise “new” exceptions

“Safe” insertion of computations a*b 21 a*b 22 11 12 a*b a*b 13 23 -- a*b is anticipatable in node 12, but not anticipatable in node 22 -- Insertion of a*b in node 12 is safe, however in 22 it is unsafe -- Insertion in edge (22,23) is safe!

Some partial redundancies cannot be eliminated through safe code insertion a*b i -- Insertion in the in-edge of node n is unsafe because a*b is not anticipatable m t=a*b a*b available a*b ¬ available, ¬ anticipatable n a*b anticipatable k a*b

Performing Partial Redundancy Elimination • Identify partially redundant occurrences of an expression e in a program • Insert occurrences of e at some program points where e is safe • Delete partially redundant occurrences of e which have become totally redundant • Classical PRE: Elimination of partial redundancies in a program through safe insertion of computations. - Can be looked upon as `code movement’ from the point of original occurrence to the point of insertion - It cannot eliminate all partial redundancies in a program!

A brief history of PRE • Morel, Renvoise (1979): Bidirectional data flows for code placement in nodes (MRA). Lacks both computational and lifetime optimality. • Dhamdhere (1988): Computational optimality and reduced lifetimes of temporaries than Morel-Renvoise through placement in nodes and edges (EPA). • Knoop, Ruthing, Steffen (1992): Lazy code motion (LCM) offering computational optimality and lifetime optimality through a priori edge splitting and placement in nodes. Drechsler and Stadel (1993) reformulated LCM to handle basic blocks. • Bodik, Gupta, Soffa (1998) : Complete elimination of partial redundancies through selective code expansion (ComPRE). Based on the work by Steffen (1996). • Kennedy et al (1999): PRE in SSA representation of programs (SSAPRE). • Dhamdhere (2002): Eliminatability path --- A versatile basis for PRE (E-path_PRE). Develops a concept originating in Dhaneshwar, Dhamdhere (1995) and uses it for evaluation of PRE algorithms and development of new ones. • Xue, Knoop (2006) and Dheeraj kumar, Dhamdhere (2006)

Morel-Renvoise Algorithm (MRA) • Performs insertions strictly in nodes of the program graph • Placement possibility (PP) of e at entry/exit of basic blocks: whether it is feasible and safe to place expression e at entry/exit of a block • Insert e at the exit of a basic block b if it can be placed at the exit of b but not at its entry • Delete an existing occurrence of e in a basic block if it can be placed at the entry of that block

Morel-Renvoise Algorithm (MRA)(simplified equations)

Morel-Renvoise Algorithm (MRA) 1 1 a=.. a=.. a*b t=a*b 2 4 2 4 t=a*b a*b t 3 5 3 5 6 a*b t 6 1. a*b is inserted in node 2. Insertion in node 3 would have been lifetime optimal. 2. a*b of node 4 cannot be optimized because it cannot be inserted in node 1. 3. a*b is saved in t in nodes 2 and 4. a*b of node 6 is replaced by use of t.

Edge placement algorithm (Dhamdhere 1988) • Performs insertions both in nodes and along edges in the program graph • An expression is hoisted as far up as possible to obtain computational optimality • It is then subjected to sinking (without affecting computational optimality) to obtain lifetime optimality • It is placed along an edge only if it cannot be placed in a node • It is performed only along a critical edge, i.e., an edge from a “branch” node to a “join” node

Edge placement algorithm (Dhamdhere 1988)

Edge placement algorithm (Dhamdhere 1988) A. Computational optimality: • The ∏ term of PPIN is dropped. Hence PPIN can be true even if PPOUT of a predecessor is false. • If PP is true for entry of a basic block i but PP is false for exit of a predecessor j, e is placed along the edge (j,i). -- It is called edge placement. A basic block is inserted in the edge if e is to be placed along it. -- Edge placement performed only along a “critical edge”, i.e. along an edge from a “branch” node to a “join” node. • Placement into nodes is done as in MRA.

Edge placement algorithm(Dhamdhere 1988) B. Reducing lifetimes of expression variables: • Move insertion points as far down as possible without sacrificing computational optimality (it is achieved by the ∑ term)

Edge placement algorithm (Dhamdhere 1988) • EPA solution technique: (“hoisting-followed-by-sinking” approach) • Solve the unidirectional data flow problem obtained by omitting the • ∑ term from the PPIN equation. It hoists e as far up as possible. Provides computational optimality. 2. Now a second data flow is solved to incorporate the ∑ term: We examine all predecessors of a block i and change PPIN of block i from true to false if the ∑ term is false for its predecessors. It sinks the hoisted expression as far down as possible without compromising computational optimality.

Edge placement algorithm (EPA) 1 1 t=a*b a=.. a*b 2 4 2 4 t a=.. a*b t=a*b t 3 5 3 5 a*b t 6 6 1. a*b is inserted in node 3. However, EPA does not provide lifetime optimality in some cases. 2. a*b is inserted in edge (1,4). This is computationally optimal.

Lazy code motion (KRS 92) • All “join” edges are split a priori by inserting blocks along them • D-Safe-earliest points: An expression is placed at the earliest points where it is anticipatable. • Evaluation of an expression is delayed to the latest point where it can be placed without losing computational optimality. • Thus, it conceptually performs “hoisting-followed-by-sinking”, as in the edge placement algorithm. • Insertion and saving is performed uniformly. • Data flow equations are not given here. (Drechsler and Stadel reformulated them.)

Lazy code motion (KRS) 1 1 (1,4) t=a*b t=a*b a=.. a*b t 2 4 2 4 a=.. a*b t 3 5 3 5 (3,6) t=a*b a*b t 6 6 1. Edges (1,4), (3,6), (5,6) and (5,4) are split a priori 2. a*b is inserted in edge (3,6). LCM provides lifetime optimality 3. a*b is inserted in edge (1,4). As in EPA, this is computationally optimal 4. Empty blocks: removed

Eliminatability paths offer .. • A conceptual basis for PRE: - Identifies partial redundancies which can be eliminated through insertion of code in safe places * We call them eliminatable partial redundancies - A simple method for identifying safe insertion points which offer lifetime optimality - Thus, no “hoisting-followed-by-sinking”

Eliminatability paths offer .. • Computationally optimal PRE: - Elimination of all eliminatable partial redundancies identified by E-paths through appropriate insertions provides computational optimality

Eliminatability paths offer .. • PRE with lifetime optimality: - Insertions performed using the notion of E-paths provides lifetime optimality

Eliminatability paths offer .. • A versatile basis for PRE: - Classical PRE: PRE performed by insertion, deletion and saving of expressions over a program graph - PRE over SSA representations of programs

Eliminatability paths offer .. • Simplicity: - Insertion, deletion and save points are identified using simple and well-known data flow concepts of availability and anticipatability

Eliminatability paths offer .. • A basis for evaluating effectiveness of an approach to PRE: - Does the approach provide computational optimality? (i.e. does it eliminate all partial redundancies which can be eliminated?) - Does the approach provide lifetime optimality?

Eliminatability Paths (E-paths) • A path i .. k in a program control flow graph is an E-path for an expression e if - Node i contains a locally available occurrence of e and node k contains a locally anticipatable occurrence of e - Nodes in the path (i .. k) are empty wrt e, i.e. they do not contain an occurrence of e or a definition of any of its operands - e is safe at the exit of each node in [i .. k), i.e., it is either available or anticipatable at the exit of each node in [i .. k). Path [i .. k) includes node i, but excludes node k. Path (i .. k) excludes nodes i and k.

Eliminatability Path* a*b i - a*b available at exit of [i .. m] - a*b anticipatable at exit of [n .. k) m • Occurrence of a*b in node k n is said to be “eliminatable” k a*b * Dhaneshwar, Dhamdhere (1995) used eliminatability of exps, but did not define or use E-paths explicitly.

Properties of E-paths: 1 • PRE using E-paths provides computational optimality • Use of this property: - Use it to evaluate computational optimality of a PRE algorithm. - A PRE algorithm possesses computational optimality if it can eliminate partial redundancy of e in EACH node k such that an E-path i .. k exists in G.

Properties of E-paths: 2 • If i .. k is an E-path and j is a node in (i .. k] - For each in-edge (g, j) such that node g is not in an E-path: if node g has a successor s which is not in an E-path then insert e in edge (g, j) else insert e in node g - Such insertion provides lifetime optimality of the temporary variable used to hold value of e • Use of the property: - Check whether a PRE algorithm provides lifetime optimality by comparing program points where insertions are made

Lifetime optimality using E-paths a*b i m g1 t=a*b t=a*b g2 j - i .. k is an E-path - Insertion in edge (g1, j) and node g2 is lifetime optimal a*b k

Evaluating MRA using E-paths 1 1 a=.. a=.. a*b t=a*b 2 4 2 4 t=a*b a*b t 3 5 3 5 a*b 6 t 6 0. Three E-paths exist: 4 .. 5, 5 .. 4 and 5 .. 6. 1. 5 .. 6 is an E-path. Insertion node 3 would have been lifetime optimal. 2. 5 .. 4 is an E-path. Hence a*b of node 4 is eliminatable, but not eliminated!

PRE using E-paths • For an E-path i .. k a) Insertions: For a node j in (i .. k] - Insert e in edge (g, j) if g is not in an E-path and has a successor which is not in an E-path - Insert e in predecessor g if g is not in an E-path and all its successors are in E-paths b) Save: Save the computation of e in node i, unless i is the end-node of some E-path h .. i (in which case it would be deleted). c) Deletion: Delete the occurrence of e in node k.

PRE using E-paths • E-path i.. k may contain 3 kinds of segments - Avail . ¬Ant segment - Avail . Ant segment - ¬Avail . Ant segment : This is called the “E-path suffix”. • Find a node m : ¬Avail(m) . Anticipatable(m). ∑ Avail(p), p=pred This is the start node of the E-path suffix. - Trace Avail . ¬Ant segment backwards from m to find node i, the start of the E-path and perform a save in it - Trace ¬Avail . Ant segment forward from m a) to perform appropriate insertion for in-edges b) to find k and perform a deletion

Segments in an E-Path a*b 1 a) 1 .. 2 : Avail · ¬Ant. 2 b) 3 .. 4 : Avail · Ant. c) 5 .. 10 : ¬Avail ·ּAnt (E-path suffix). 3 4 Start node Of E-path suffix 5 E-path suffix: insertions may be needed in paths joining it ׃ a*b 10

Simple data flows for E-path_PRE@ Comp : e is locally available (i.e. downwards exposed) in node Antloc : e is locally anticipatable (i.e. upwards exposed) in node Transp : node does not contain definitions of e’s operands @ : Terminology is from Morel-Renvoise algorithm

Simple data flows for E-path_PRE • Availability and Anticipatability (i.e. very busy exps.) • Eps-in/Eps-out (Node is in E-path suffix)

Simple data flows for E-path_PRE • Availability and Anticipatability • Eps-in/Eps-out (Node is in E-path suffix) • SA_in/SA_out (A save should be “performed above”)

Efficiency of E-path_PRE data flows • The generalized theory of bit-vector data flow analysis by Khedker, Dhamdhere (1994) defines two concepts for determining the cost of data flow analysis - Information flow path (ifp): A graph path along which data flow information may “flow” during data flow analysis. (Information “flow” : Values of data flow properties change from`lattice top’ to `lattice bot’ during iterative data flow analysis) - “Width” of a graph (reduces to depth of a graph for unidirectional data flows)

Efficiency of E-path_PRE data flows • The generalized theory of bit-vector data flow analysis by Khedker, Dhamdhere (1994) defines two concepts for determining the cost of data flow analysis - Information flow path (ifp): A graph path along which data flow information may “flow” during data flow analysis. (Information “flow” : Values of data flow properties change from`lattice top’ to `lattice bot’ during iterative data flow analysis) - “Width” of a graph (reduces to depth of a graph for unidirectional data flows) • Number of bit-vector operations during work-list iterative df analysis depend on length of an ifp, and the number of iterations during round-robin iterative df analysis depend on width of an ifp

Efficiency of E-path_PRE data flows • The Eps_in/out data flow of E-path_PRE has been designed to have “short” information flow paths. This fact may also lead to small width of a program graph. • Short information flow paths and small width leads to smaller solution times of data flows. This fact is borne out by experimentation --- comparison with the “later” data flow of Drechsler, Stadel (1993) (Dhamdhere 2002): - In worklist solution: No. of bit vector operations is 80% smaller - In round-robin iterative solution: No. of iterations is 37% smaller

Code placement models in PRE • Node model - Simple node model Each node contains a single statement - Basic block model Each node is a basic block • Insertion and Saving model - Saving in situ Value of an expression is saved in the place where it is located - Saving in entry/exit of node An expression is moved to node entry/exit if its value is to be saved - Insertion at entry/exit of node - Unified insertion and saving This is possible only when saving is done at node entry/exit

Code placement models in PRE • Morel-Renvoise Algorithm (MRA): - Basic blocks, saving in situ, insertion at exit • Edge placement algorithm (EPA): - Basic blocks, saving in situ, insertions at node exit and in critical edges (edge splitting performed on a needs basis) • Lazy Code Motion (LCM): - Simple nodes, unified saving and insertion, insertion at node entries and in blocks inserted in join edges in a priori edge splitting • E_path-PRE - Basic blocks, saving in situ, insertions at node exit and in critical edges • SIM-PRE - Basic blocks, saving in situ, insertion strictly along edges

Evaluation of code placement models using E-paths • Morel-Renvoise algorithm (MRA) Missed opportunities of optimization (seen before) • Lazy code motion (LCM) Performs insertion in a join edge (p,j) even if it could have been performed in node p a*b 2 1 a*b inserted 3 a*b

Evaluation of code placement models using E-paths • Optimal code motion (OCM) Knoop et al 1994 - Basic blocks, Hybrid model, Insertions at node entry and exit - Hybrid: Uniform insertion and saving model but saving is performed in situ No insertions and savings will be performed at entry to a node (Lemmas 19 and 23). Hence this feature is redundant.

Code optimization by partial redundancy elimination using Eliminatability paths (E-paths)