1 / 43

Static Analysis: Data-Flow Analysis I

Static Analysis: Data-Flow Analysis I. ( AMP’09: Advanced Models & Programs, 2009 ). Claus Brabrand IT University of Copenhagen ( brabrand@itu.dk ). Agenda (23/3, ’09). Introduction: Undecidability, Reduction, and Approximation Data-flow Analysis: Quick tour & running example

Download Presentation

Static Analysis: Data-Flow Analysis I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Static Analysis:Data-Flow Analysis I ( AMP’09: Advanced Models & Programs, 2009 ) Claus Brabrand IT University of Copenhagen ( brabrand@itu.dk )

  2. Agenda (23/3, ’09) • Introduction: • Undecidability, Reduction, and Approximation • Data-flow Analysis: • Quick tour & running example • Control-Flow Graphs: • Control-flow, data-flow, and confluence • ”Science-Fiction Math”: • Lattice theory, monotonicity, and fixed-points • Putting it all together…: • Example revisited

  3. Breaks? • #Breaks, |Breaks|, ...? I’ll aim for: more (but shorter) breaks

  4. Funny Point… • [PS] told you that arrays are just ptr’s (in ’C’): • ”E1[E2]” gets turned into ”*(E1+E2)” • e.g., • But we also know that ”+” is commutative: • i.e., • Thus: • i.e., you may as well write ”7[a]” instead of ”a[7]” which is the exact same thing; …try it out!:-)  a[7] *(a+7) x,y: x + y = y + x  a[7] *(a+7)   *(7+a) 7[a]

  5. ”C” is a really weird lang… #include <stdio.h> #include <stdlib.h> int main(char* args[]) {   int* a = (int*) malloc(3*sizeof(int)); a[2] = 7; int weird = 2[a];    printf(“you’d think there’d be a type error, but this program compiles w/o warning and prints out seven (%d) just fine :-)\n", weird);    return 1; }

  6. ”Lecture Notes on Static Analysis” by Michael I. Schwartzbach (BRICS, Uni. Aarhus) Chapter 1, 2, 4, 5, 6 (until p. 19) (Excl. ”pointers”) Notes on Static Analysis Claims to be "not overly formal", but the math involved can be quite challenging (at times)…

  7. Static Analysis • Purpose (of static analysis): • Gather information (on running behavior of program) • ”program points” • Usage (of static analysis): • Basis for subsequent…: Error detection information program points Static Analysis Optimization

  8. Analyses for Optimization • Example Analyses: • ”Constant Propagation Analysis”: • Precompute constants (e.g., replace ’5*x+z’ by ’42’) • ”Live Variables Analysis”: • Dead-code elimination (e.g., get rid of unused variable ’z’) • ”Available Expressions Analysis”: • Avoid recomputing already computed exprs (cache results) • … • … • … • …

  9. Analyses for Finding Errors • Example Analyses: • ”Symbol Checking”: • Catch (dynamic) symbol errors • ”Type Checking”: • Catch (dynamic) type errors • ”Initialized Variable Analysis”: • Catch unintialized variables • … • … • … • …

  10. Quizzzzz: Optimization? • If you want a fastC-program, should you use: • LOOP 1: • LOOP 2 (optimized by programmer): • i.e., ”array-version” or ”optimized pointer-version” ? for (i = 0; i < N; i++) { a[i] = a[i] * 2000; a[i] = a[i] / 10000; } ? …or… b = a; for (i = 0; i < N; i++) { *b = *b * 2000; *b = *b / 10000; b++; } ?

  11. Answer: You’ll learn how to make similar analyses [next two weeks] • Results (of running the programs): • Compilers use highly sophisticated static analyses for optimization! • Recommendation: focus on writing clear code(for people and for compilers to understand)

  12. Conceptual Motivation Undecidability Reduction principle Approximation

  13. Rice’s Theorem (1953) • Examples: • does program ’P’ always halt? • is the value of integer variable ’x’ always positive? • does variable ’x’ always have the same value? • which variables can pointer ’p’ point to? • does expression ’E’ always evaluate to true? • what are the possible outputs of program ’P’? • … “Any interesting problem about the runtime behavior of a program* is undecidable” -- Rice’s Theorem [paraphrased] (1953) *) written in a turing-complete language

  14. Undecidability (self-referentiality) • Consider "The Book-of-all-Books": • This book contains the titles of all books that do not have a self-reference(i.e. don't contain their title inside) • Finitely many books; i.e.: • We can sit down & figure out whether to include or not... • Q: What about "The Book-of-all-Books"; • Should it be included or not? • "Self-referential paradox"(many guises): • e.g. "Thissentence is false" "The Bible" "War and Peace" "ProgrammingLanguages, An Interp.-Based Approach" ... The Book-of-all-Books  

  15. Termination Undecidable! • Assume termination is decidable (in Java); • i.e.  some program, halts: Program  bool • Q: Does P0loop or terminate...? :) • Hence: haltscannot exist! • i.e., "Termination is undecidable" bool halts(Program p) { ... } -- P0.java -- bool halts(Program p) { ... } Programp0 = read_program("P0.java"); if (halts(p0)) loop(); elsehalt(); *) for turing-complete languages

  16. Rice’s Theorem (1953) • Examples: • does program ’P’ always halt? • is the value of integer variable ’x’ always positive? • does variable ’x’ always have the same value? • which variables can pointer ’p’ point to? • does expression ’E’ always evaluate to true? • what are the possible outputs of program ’P’? • … “Any interesting problem about the runtime behavior program* is undecidable” -- Rice’s Theorem [paraphrased] (1953) *) written in a turing-complete language reduction

  17. solve always-pos solve halts • Assume ’x-is-always-pos(P)’ is decidable • Given P(here’s how we could solve ’halts(P)’): • Construct (clever) reduction program R: • Run ”supposedly decidable” analysis: res = • Deduce from result: if (res) then P loops!; else P halts :-) • THUS: ’x-is-always-pos(P)’ must be undecidable! -- R.java -- int x = 1; P/* insert program P here :-) */ x = -1; x-is-always-positive(R)

  18. Reduction Principle • Reduction principle (in short): • Example: • Exercise: • Carry out reduction + whole explanation for: • ”which variables can pointer ’p’ point to?” (P)undecidable [solve (P)solve (P)] (P) undecidable reduction ’halts(P)’undecidable [solve’x-is-always-pos(P)’solve ’halts(P)’] ’x-is-always-pos(P)’ undecidable

  19. Answer • Assume ’which-var-q-points-to(P)’ is decidable: • Given P (here’s how to (cleverly) decide halts(P)): • Construct (clever) reduction program R: • Run ’which-var-p-points-to(R)’ = res • If (null  res) P halts! else; P loops! :-) • THUS: ’which-var-q-points-to(P)’ must be undecidable! -- R.java -- • ptr q = 0xffff; • P/* insert program P */ • q = null;

  20. Undecidability • Undecidability means that…: • …no-one can decide this line(for all programs)! • However(!)… ? loops halts

  21. “Side-Stepping Undecidability” • Compilers use safe approximations(computed via ”static analyses”) such that: However, just because it’s undecidable, doesn’t mean there aren’t (good) approximations! Indeed, the whole area of static analysis works on “side-steppingundecidability”: error error ok ok Okay! Dunno? Error! Dunno?

  22. “Side-Stepping Undecidability” • Unsafeapproximation: • For testing it may be okay to ”abandon” safety and use unsafe approximations: However, just because it’s undecidable, doesn’t mean there aren’t (good) approximations! Indeed, the whole area of static analysis works on “side-steppingundecidability”: loops halts unsafe approximation error ok Here are some programs for you to (manually) consider !

  23. “Slack” • Undecidability means: “there’ll always be a slack”: • However, still useful:(possible interpretations of “Dunno?”): • Treat as error(i.e., reject program): • “Sorry, program not accepted!” • Treat as warning (i.e., warn programmer): • “Here are some potential problems: …” loops . . . halts Okay! Dunno?

  24. Soundness: Analysis reports no errors Really are no errors Completeness: Analysis reports an error Really is an error Soundness & Completeness error error ok ok Complete analysis Sound analysis …or alternative (equivalent) formulation, via ”contra-position”:  P  Q Q P • Really are error(s) Analysis reports error(s) • Really no error(s) Analysis reports no error(s)

  25. Example: Type Checking • Will this program have type error (when run)? • Undecidable (because of reduction): • Type error  <EXP> evaluates to true void f() { var b; if (<EXP>) { b = 42; } else { b = true; } /* some code */ if (b) ...; // error is b is '42' } Undecidable (in all cases) i.e., what <EXP> evaluates to (when run)

  26. Example: Type Checking • Hence, languages use static requirements: • All variables must be declared • And have only one type (throughout the program) • This is (very) easy to check (i.e., "type-checking") void f() { bool b; // instead of ”var b;” /* some code */ if (<EXP>) { b = 42; } else { b = true; } /* some more code */ } Static compiler error: Regardless of what <EXP>evaluates to when run

  27. Agenda (23/3, ’09) • Introduction: • Undecidability, Reduction, and Approximation • Data-flow Analysis: • Quick tour & running example • Control-Flow Graphs: • Control-flow, data-flow, and confluence • ”Science-Fiction Math”: • Lattice theory, monotonicity, and fixed-points • Putting it all together…: • Example revisited

  28. 5’ Crash Course on Data-Flow Analysis ( AMP’09: Advanced Models & Programs, 2009 ) Claus Brabrand IT University of Copenhagen ( brabrand@itu.dk )

  29. Data-Flow Analysis • IDEA: • We (only) need 3 things: • A control-flow graph • A lattice • Transfer functions • Example: “(integer) constant propagation” “Simulate runtime execution at compile-timeusing abstract values”

  30. We (only) need 3 things: • A control-flow graph • A lattice • Transfer functions Control-flow graph int x = 1; Given program: int x = 1; int y = 3; if (...) { x = x+2; } else { x <-> y; } print(x,y); int y = 3;  ... true false x = x+2; x <-> y; print(x,y);

  31. We (only) need 3 things: • A control-flow graph • A lattice • Transfer functions A Lattice • Lattice L of abstract values of interestand their relationships (i.e. ordering “”): • Induces least-upper-bound operator: • for combining information “top” ~ “could be anything” ·· -3 -2 -1 0 1 2 3 ·· “bottom” ~ “we haven’t analyzed yet”

  32. [ , ]  ENVL [ , ] [ , ] [ , ] [ , ] [ , ] [ , ] [ , ] [ , ] [ , ] [ , ] • We (only) need 3 things: • A control-flow graph • A lattice • Transfer functions Data-Flow Analysis x y E . E[x  1] int x = 1; 1 1 int y = 3; E . E[y  3] 1 3 1 3 ... 1 3 1 3 1 3 E . E[x  E(y), y E(x)] E . E[x  E(x) 2] x = x+2; x <-> y; 3 3 1 3 3 print(x,y);

  33. Agenda (23/3, ’09) • Introduction: • Undecidability, Reduction, and Approximation • Data-flow Analysis: • Quick tour & running example • Control-Flow Graphs: • Control-flow, data-flow, and confluence • ”Science-Fiction Math”: • Lattice theory, monotonicity, and fixed-points • Putting it all together…: • Example revisited

  34. Exp true false Stm2 Stm1 confluence Exp true false Stm confluence Control Structures • Control Structures: • Statements (or Expr’s) that affect ”flow of control”: • if-else: • if: if ( Exp) Stm1elseStm2 [syntax] [semantics] The expression must be of type boolean; if it evaluates to true, Statement-1 is executed, otherwise Statement-2 is executed. if ( Exp) Stm [syntax] [semantics] The expression must be of type boolean; if it evaluates to true, the given statement is executed, otherwise not.

  35. confluence Exp true false Stm Exp1; confluence Exp2 true false Stm Exp3; Control Structures(cont’d) • while: • for: while ( Exp) Stm [syntax] [semantics] The expression must be of type boolean; if it evaluates to false, the given statement is skipped, otherwise it is executed and afterwards the expression is evaluated again. If it is still true, the statement is executed again. This is continued until the expression evaluates to false. for (Exp1 ; Exp2 ; Exp3) Stm [syntax] [semantics] Equivalent to: { Exp1; while ( Exp2 ) { StmExp3; } }

  36. Control-flow graph int x = 1; Given program: int x = 1; int y = 3; if (a>b) { x = x+2; } else { x <-> y; } print(x,y); int y = 3;  a>b true false x = x+2; x <-> y; print(x,y);

  37. Exercise: Draw a Control-Flow Graph for: public static void main ( String[] args ) { int mi, ma; if (args.length == 0) System.out.println("No numbers"); else { mi = ma = Integer.parseInt(args[0]); for (int i=1; i < args.length; i++) { int obs = Integer.parseInt(args[i]); if (obs > ma) ma = obs; else if (mi < obs) mi = obs; } System.out.println(”min=" + mi + "," + "max=" + ma); }} if else for if else if

  38. Control-FlowGraph int mi, ma; • CFG: args.length == 0 true false System.out.println("No numbers"); mi = ma = Integer.parseInt(args[0]); int i=1; i < args.length true false int obs = Integer.parseInt(args[i]); obs > ma true false ma = obs; mi < obs true false mi = obs; i++; System.out.println(”min=" + mi + "," + "max=" + ma);

  39. Control Structures(cont’d2) • do-while: • ”?:”; ”conditional expression”: • ”||”; ”lazy disjunction” (aka., ”short-cut ”): • ”&&”; ”lazy conjunction”(aka., ”short-cut ”): • switch: Exercise: doStmwhile ( Exp ); Exp1 ? Exp2 : Exp3 Exp1 || Exp2 Exp1 && Exp2 Swb: caseExp : Stm*break; switch ( Exp) { Swb*} default : Stm*break;

  40. Control Structures(cont’d3) • try-catch-finally(exceptions): • return / break / continue: • ”method invocation”: • e.g.; • ”recursive method invocation”: • e.g.; • ”virtual dispatching”: • e.g.; tryStm1catch ( Exp ) Stm2finallyStm3 return ; returnExp; break ; continue ; f(x) f(x) f(x)

  41. Control Structures(cont’d4) • ”function pointers”: • e.g.; • ”higher-order functions”: • e.g.; • ”dynamic evaluation”: • e.g.; • Some constructions (and thus languages) require a separate control-flow analysisfor determining control-flow in order to dodata-flow analysis (*f)(x) f.x.(f x) eval(some-string-which-has-been-dynamically-computed)

  42. Agenda (23/3, ’09) • Introduction: • Undecidability, Reduction, and Approximation • Data-flow Analysis: • Quick tour & running example • Control-Flow Graphs: • Control-flow, data-flow, and confluence • ”Science-Fiction Math”: • Lattice theory, monotonicity, and fixed-points • Putting it all together…: • Example revisited

More Related