170 likes | 345 Views
Data-Flow Analysis Framework. Domain What kind of solution is the analysis looking for? Ex. Variables have not yet been defined Algorithm assigns a set of assertions to each node/edge Approximation Useful data-flow properties are never 100% accurate Rice’s Theorem, from 1953
E N D
Data-Flow Analysis Framework • Domain • What kind of solution is the analysis looking for? • Ex. Variables have not yet been defined • Algorithm assigns a set of assertions to each node/edge • Approximation • Useful data-flow properties are never 100% accurate • Rice’s Theorem, from 1953 • Lower approximation is called a MUST analysis • Set of solutions found is smaller than the set of actual solutions • Upper approximation is called a MAY analysis • Set of solutions found may be larger than the set of actual solutions
Data-Flow Analysis Framework • Direction • Forwards: For each node/edge, computes information about past behavior • Backwards: For each node/edge, computes information about future behavior • Transfer Functions • JOIN: Specifies how information from adjacent nodes /edges is propagated • MAY: Union of adjacent edges • MUST: Intersection of adjacent edges • GEN: Specifies which possible solutions are generated at the node/edge • KILL: Specifies which possible solutions are removed at that node/edge
Data-Flow Algorithm • Start at the top (bottom) of the CFG • Forwards: top • Backwards: bottom • At each node compute: (JOIN() – KILL(node)) U GEN(node) At each branch: Follow all paths, in any order, up to node where path merges Once all paths up to merge are complete, continue at merge node • If all JOIN edges are not yet computed, • use empty set (MAY) • universal set (MUST) • For loops: • repeat until the solution for all nodes in loop doesn’t change • Called the “fixed-point”
Liveness • A variable is live at a node if its current value can be read during the remaining execution of the program • Domain: program variables • Backwards MAY analysis
Liveness Transfer Functions • Exit • GEN(exit) = { } • KILL(exit) = { } • Conditions and Output • GEN(stmt) = Set of all variables appearing in the statement • KILL(stmt) = { } • Assignment • GEN(assignment) = Set of all variables appearing on the right-hand side • KILL(assignment) = Set with variable being assigned to • Declaration • GEN(declaration) = { } • KILL(declaration) = Set of variables being declared • Other • GEN(other) = { } • KILL(other) = { }
Liveness Example {x, y} {x} {x, y} { } {x} true false { } {x} {x, z} {x, z} START true false {x, z} { x } { } END
Liveness Application • Memory Allocation • Since y and z are never live at the same time, they can share the same memory location • Performance Optimization • Assignment, z = z – 1, is never used
Liveness Application • Bug Checking (z = z – 1) is dead on assignment • FindBugs says: “This instruction assigns a value to a local variable, but the value is not read or used in any subsequent instruction. Often, this indicates an error, because the value computed is never used. “
Data-Flow Framework Summary • Generic framework for different analyses • Each analysis defines • Domain • Approximation • Direction • Transfer Functions • Used for optimization, verification, and testing
Reaching Definitions • An assignment statement that may have defined the value of a variable at a particular node • Domain: assignment statements • Forwards MAY analysis
Reaching Definitions Transfer Functions • Assignments • GEN(assignment) = the statement itself • KILL(assignment) = Statements that assigned to the same variable • Declaration • GEN(decl) = the statement itself • KILL(decl) = 0 • Other • GEN(other) = 0 • KILL(other) = 0
Reaching Definitions Example START { } a1 x = input {a1} {a1, a2, a3, a5, a6} a2 {a1} x > 1 y = x/2 {a1, a2, a3, a5, a6} a3 {a1, a2} y > 3 x = x - y {a1, a2} {a1, a2, a3, a5, a6} a4 {a2, a3, a6} {a2, a3} z = x - 4 a5 {a1, a2, a3, a4, a5} {a1, a2, a3, a4} z > 0 x = x/2 {a1, a2, a3, a5, a6} a6 {…} {a2, a4, a5} output x z = z - 1 {a1, a2, a3, a5, a6} END
Reaching Definitions Applications • FindBugs: “NP: Possible null pointer dereference” • Debugging • “Slicing” tools • Following chains of Reaching Definitions backwards to track down bugs • Basis for Information Flow Security • Discuss in lectures on Security
Exercise • Compute the reaching definitions for each node, using the iterative dataflow algorithm. • Show solutions for each loop iteration. 1: function test(r1, r2, r3, r4, r5) { 2: while(r1 < 10) { 3: r1 = r1 + 1; 4: r5 = r1 * 2; 5: if((r1 % 2) == 0) 6: r2 = 0; 7: else 8: r2 = r2 + 1; 9: r4 = r2 + r1; 10: } 11: return r4 + r5; 12: }