1 / 38

Data-Flow Analysis

Data-Flow Analysis. Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency analysis Interprocedural analysis. Approaches. Dynamic Analysis Assertions Error seeding, mutation testing Coverage criteria

Download Presentation

Data-Flow Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data-Flow Analysis

  2. Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency analysis Interprocedural analysis Approaches • Dynamic Analysis • Assertions • Error seeding, mutation testing • Coverage criteria • Fault-based testing • Object oriented testing • Regression testing

  3. Data Flow Analysis (DFA) • Efficient technique for proving properties about programs • Not as powerful as automated theorem provers, but requires less human expertise • Uses an annotated control flow graph model of the program • Compute facts for each node • Use the flow in the graph to compute facts about the whole program • We’ll focus on single units

  4. Some examples of DFA techniques • DFA used extensively in program optimization • e.g., determine if a definition is dead (and can be removed) determine if a variable always has a constant value determine if an assertion is always true and can be removed • DFA can also be used to find anomalies in the code • Find def/ref anomalies [Osterweil and Fosdick] • Cecil/Cesar system demonstrated the ability to prove general user-specified properties [Olender and Osterweil] • FLAVERS demonstrated applicability to concurrent system[Dwyer and Clarke] • Why “anomalies” and not faults? • May not correspond to an actual executable failure

  5. Data flow analysis • First,determine local information that is true at each node in the CFG • e.g., What variables are defined What variables are referenced • Usually stored in sets • e.g.,ref(n) is the set of variables referenced at node n • Second, use this local information and control flow information to compute global information about the whole program • Done incrementally by looking at each node’s successors or predecessors • Use a fixed point algorithm-continue to update global information until a fixed point is reached

  6. Reaching Definitions • Definition reaches a nodeif there is a def clear path from the definition to that node • Definition of x at node 1 reaches nodes 2,3,4,5 but not 6 x:= 1 2 Could be used to determine data dependencies, useful for debugging, data flow testing, etc. 5 x:= 3 6 4 :=x Start from a definition, and move forward on the graph to see how far it reaches

  7. Definitions that might reach a node Definitions that must reach a node reaching_def={xi} reaching_def={xi,yj} 1 1 2 2 reaching_def={yj} reaching_def={yj} 3 3 Computing global information- reaching definitions Start from a definition, and move forward on the graph to see how far it reaches reaching_def={xi,yj} reaching_def={yj}

  8. Reaching Definitions • Xi means that the definition of variable x at node i {might|must} reach the current node • Also stated as {possible|definite} {some|all} {any|all} {may|must}

  9. Computing values for a node Forward flow • Keep track of the definitions into a node that have not been redefined • Flow out of a node depends on flow in and on what happens in the node y:= …x In(n) def(n)={y} ref(n)={x…} Out(n)

  10. def={x} ref={ } def={y} ref={x} X1 Y2 X4 def={} ref={x} def={x} ref={x,y} def={} ref={ } Example: Possible Reaching Definitions { } x = foo() {X1} int x,y; ... x := foo(); y := x + 2; if x > 0 then x := x + y; end if; ... {X1} y = x + 2 {X1,Y2} {X1,Y2} if(x > 0) {X1,Y2} {X1,Y2} x = x + y {X4,Y2} Forward flow,any path problem, def/ref sets are the initial facts associated with each node {X1,Y2,X4} {X1,Y2,X4}

  11. X1 Y2 X4 Example: Definite Reaching Definitions { } x = foo() {X1} int x,y; ... x := foo(); y := x + 2; if x > 0 then x := x + y; end if; ... {X1} y = x + 2 {X1,Y2} {X1,Y2} if(x > 0) {X1,Y2} {X1,Y2} x = x + y {X4,Y2} Forward flow,all path problem, def/ref sets are the initial facts associated with each node {Y2} {Y2}

  12. X1 Y2 X4 Example: Definite Reaching Definitions What happens when we add a loop? { } x = foo() {X1} int x,y; ... x := foo(); y := x + 2; if x > 0 then x := x + y; end if; ... {X1} y = x + 2 {X1,Y2} {X1,Y2} if(x > 0) {X1,Y2} {X1,Y2} x = x + y {X4,Y2} Forward flow,all path problem, def/ref sets are the initial facts associated with each node {Y2} {Y2}

  13. X1 Y2 X4 Example: Definite Reaching Definitions What happens when we add a loop? { } x = foo() {X1} int x,y; ... x := foo(); y := x + 2; if x > 0 then x := x + y; end if; ... {X1} y = x + 2 {X1,Y2} X {X1,Y2} if(x > 0) X {X1,Y2} X {X1,Y2} x = x + y X {X4,Y2} Forward flow,all path problem, def/ref sets are the initial facts associated with each node {Y2} {Y2}

  14. Computing values for a node Forward flow • Keep track of the definitions into a node that have not been redefined • Flow out of a node depends on flow in and on what happens in the node y:= …x In(n) Out(n) For definite reaching defs: In(n) = kєpredOut(k)Out(n) = In(n)-def(n) U def(nn)

  15. Live Variables • a variable, x, is live at node p if there exists a def-clear path wrt x from node p to a use of x • x is live at 2, 3, 4, but not at node 5 x:= 1 2 Used to determine what variables need to be kept in a register after executing a node 5 3 6 x:= Start from a use, and move backward on the graph until a def is encountered 4 :=x

  16. Possible live variables Definite live variables 1 1 2 2 3 3 4 4 live={x} live={x} live={y} live={x, y} Computing global information- live variables live={x,y} live={x}

  17. Live Variables: Computing values for a node Backward flow • Keep track of the (forward) references that have not found their (backward) definition site • Flow out of a node depends on flow in and on what happens in the node Out(n) y:= …x In(n)

  18. Example: Possible Live Variables { } x = foo() int x,y; ... x := foo(); y := x + 2; if x > 0 then x := x + y; end if; ... {x} {x} y = x + 2 {x,y} {x,y} if(x > 0) {x,y} {x,y} Backward flow,any path problem, def/ref sets are the initial facts associated with each node x = x + y {} {} {}

  19. Example: Definite Live Variables { } x = foo() int x,y; ... x := foo(); y := x + 2; if x > 0 then x := x + y; end if; ... {x} {x} y = x + 2 {x} {x} if(x > 0) {} {x,y} Backward flow,all path problem, def/ref sets are the initial facts associated with each node x = x + y {} {} {}

  20. Example: Definite Live Variables int x,y; ... x := foo(); y := x + 2; if x > 0 then x := x + y; end if; ... { } x = foo() {x} {x} y = x + 2 • Backward flow,all path problem • def/ref sets are the initial facts associated with each node • In(n) = kєsucc(n)Out(k)Out(n) = In(n)-def(n) U ref(n) {x} {x} if(x > 0) {} {x,y} x = x + y {} {} {}

  21. Data Flow Analysis decisions • Backward/Forward • Any/All • Facts • Equations • Initial values 1 2 3 4 5 Union or Intersection

  22. Constant Propagation • Some variables at a point in a program may only take on one value • If we know this, can optimize the code when it is compiled

  23. Constant Propagation x = 3 int x,y; ... x := 3; y := x + 2; if x > z then x := x + y; end if; ... y = x + 2 if(x > z) x = x + y

  24. Constant Propagation U=unknown N= not a constant (x,y) (U,U) x = 3 (3,U) int x,y; ... x := 3; y := x + 2; if x > z then x := x + y; end if; ... (3,U) y = x + 2 (3,5) (3,5) if(x >z) (3,5) Forward flow,all paths problem Facts are the computations and assignments made at each node (3,5) x = x + y (8,5) (N,5)(N,5)

  25. Constant Propagation with a loop U=unknown N= not a constant (U,U) x = 3 (3,U) (3,U) y = x + 2 (3,5) (3,5) (N,5) if(x > z) (3,5) (N,5) Forward flow,all paths problem Facts are the computations and assignments made at each node (3,5) (N,5) x = x + y (8,5) (N,5) (N,5)

  26. Fixed point • The data flow analysis algorithm will eventually terminate • If there are only a finite number of possible sets that can be associated with a node • If the function that determines the sets that can be associated with a node is monotonic

  27. Constant Propagation with a loop U=unknown N= not a constant (U,U) x = 3 (3,U) (3,U) y = x + 2 (3,5) if(x > 0) (3,5) (N,5) Forward flow,all paths problem Facts are the computations and assignments made at each node (3,5) (N,5) x = x + y (8,5) (N,5) (N,5)

  28. DAVE: detects anomalous def/ref behavior • L. D. Fosdick and Leon J. Osterweil, " Data Flow Analysis in Software Reliability", ACM Computing Surveys, September 1976, 8 (3), pp. 306-330. • Application independent specification of erroneous behavior • use of an uninitialized variable • redefinition of a variable that is not referenced

  29. Unreferencedefinition Anomalous pairsof ref/def information d - defined, r - referenced, u - undefined d..r: defined variable reaches a reference u..d: undefined variable reaches a definition d..d: definition is redefined before being used d..u: definition is undefined before being used u..r: undefined variable reaches a reference undefined reference

  30. Consider an unreferenced definition • For a definition of a, want to know if that definition is not going to be referenced • Is there some path where a is redefined or undefined before being used? • May be indicative of a problem • Usually just a programming convenience and not a problem • For a definition of a, want to know if on all paths is a redefined or undefined before being used • May be indicative of a problem • Or could just be wasteful

  31. def a def a ref a def a def a Some versus All For some path, a is redefined without a subsequent reference For all paths, a is redefined without a subsequent reference

  32. Unreferenced definition: Computing values for a node Forward flow • Keep track of the definitions into a node that have not been referenced • Flow out of a node depends on flow in and on what happens in the node y:= …x In(n) Out(n)

  33. Unreferenced definition: Computing values for a node Forward flow • Keep track of the definitions into a node that have not been referenced • Flow out of a node depends on flow in and on what happens in the node y:= …x In(n) Out(n) Forward flow,all paths problem In(n) = kєpred(n)Out(k)Out(n) = In(n)-ref(n) U def(n)

  34. Need to look at each node where there is a def Unreferenced definitions (unreferenced defs) int x,y; ... x := foo(); y := x + 2; if x > 0 then x := x + 1; end if; y:= … { } x = foo() {x} {x} y = x + 2 {y} {y} if(x > 0) {y} {y} x = x + 1 {x,y} {y} Forward flow,all paths problem y:= z {y}

  35. def={x} ref={ } def={y} ref={x} def={} ref={x} def={x} ref={x} def={y} ref={ } Continuing with the unreferenced def example • A definiton is redefined without being used if def(n)  In(n)-ref(n) • Must compute this for each node • For this example, the last node would report a redefinition of y on all paths • Finds the location where the redef occurs { } x = foo() {x} {x} y = x + 2 {y} {y} if(x > 0) {y} {y} x = x +1 {x,y} y:= … {y} {y}

  36. Unreferenced definitions, (finds the location of the first def of the pair) (unreferenced defs) int x,y; ... x := foo(); y := x + 2; if x > 0 then x := x + 1; end if; y := ... {x,y} x = foo() {y} {y} y = x + 2 {y} double def {y} if(x > 0) {y} {y} x = x +1 {y} {y} y:= … Backward flow,all paths problem {y}

  37. In values depend on direction Forward flow Backward flow y:= … In(n) y:= … Out(n) Out(n) In(n)

  38. Data Flow Analysis Problem • Need to determine the information that should be computed at a node • Need to determine how that information should flow from node to node • Backward or Forward • Union or Intersection • Often there is more than one way to solve a problem • Can often be solved forward or backward, but usually one way is easier than the other

More Related