710 likes | 927 Views
Getting Started in Program Analysis Research: Outline. Background and useful skills Ana Using and developing analysis Mary Lou Identifying and building infrastructure Lori Evaluating your analysis Ana. Ana Milanova. I am from Bulgaria National High School for Math and Science
E N D
Getting Started in Program Analysis Research: Outline • Background and useful skills • Ana • Using and developing analysis • Mary Lou • Identifying and building infrastructure • Lori • Evaluating your analysis • Ana
Ana Milanova • I am from Bulgaria • National High School for Math and Science • American University in Bulgaria, 1997 • I have a degree in Business Administration • Rutgers University, PhD in CS, 2003 • Now Assistant Professor at RPI • Research: program analysis for software tools • Family • Husband Tony • Katarina, 5 and Petar, 2
Program Analysis • Static program analysis • Analyzes the source code of the program • Run-time behavior properties without running the program • E.g., ”The object values that flow to reference variable x are only of classes A and B, but not C.” • Static analyses are conservative: consider all possible run-time behaviors of the program
Program Analysis • Dynamic program analysis • Analyzes a set of program executions • Reasons about run-time behavior properties over observed executions • E.g., ”The object values that flowed to reference variable xduring observed executions were only of classes A and B, but not C.” • Dynamic analyses are incomplete: consider only behaviors over particular executions • Goal: combine with static analysis
Uses of Static Program Analysis • Compilers – traditional application domain • Enables optimizing transformation • Software engineering tools • Static debugging, verification, security • Uncover difficult errors and security flaws • Testing • Evaluate and improve test suites • Software understanding • Calling structure • Complex dependences • Change impacts
Uses of Program Analysis Analysis for compiler optimization is differentfrom Analysis for software tools Different requirements, different success criteria (more later…)
Static Analysis Methodologies • Data-flow analysis • Constraint-based program analysis • Abstract interpretation • Type and effect systems • Model checking
Example: Data-flow Analysis 1. i=11 read x,y • Flow facts • Information that we are propagating • E.g., set of definitions {(i,1), (i,4),(i,6)…} • Transfer functions • The effect of a statement on the incoming flow facts • E.g., statement i=11 at 6 “kills” the incoming definition (i,4), and “generates” definition (i,6) {(i,1)} 2. if x<y {(i,1)} {(i,1)} 3. p(i) 4. i=j+5 {(i,4)} 5. p(i) {(i,4)} 6. i=11 {(i,1)} {(i,6)} 7. i=i*i
Theory • Data-flow frameworks • Control-flow graph CFG • Space of flow facts L • Space of transfer functions F • Certain properties of L and F allow a general solution procedure • Fixed-point iteration • Termination: the iterative computation terminates • Safety (correctness, soundness): the solution is conservative • For most problems the analysis produces “noise”
Theory and Practice • Analysis cost – how much time, memory • Analysis precision – how much noise • a.m(): A more precise analysis a: {B}, and a less precise analysis a: {A,B,C} • Typically, there is a tradeoff between cost and precision! • In practice, we need to analyze very large programs, 100K LOC, even 1M LOC
Theory and Practice • Approximations - introduce noise • make the CFG “smaller” • make the set of flow facts “smaller” • make the transfer functions converge faster • Approximations are necessary • But be careful: different approximations for different analyses
Standard Approximations • Flow-sensitive vs. flow-insensitive x: {true}x = true; x: {true}, y: {false}y = false; x: {false}, y: {false} x = y; x: {true,false}, y: {false}
Standard Approximations • Context-sensitive vs. context-insensitive Merged flow: A(bool X) { this.f = X;} a = new A(true); b = new A(false); a.f = true/false a.f = true b.f = true/false b.f = false a.f: {true,false}, b.f: {true,false} a.f: {true}, b.f: {false}
Useful Background and Skills • Higher-level undergraduate or graduate courses on: • Programming Languages, Compilers, Algorithms, Logic, Software Engineering, Architecture • Analytical and programming skills Step1: Design a program analysis algorithm • Understand your target language (e.g., Java and C++, C) Step2: Implement the analysis algorithm • Understand the language(s) of the infrastructure Step3: Evaluate analysis algorithm
Useful Resources • Books (my personal list) • “Compilers: Principles, Techniques and Tools” by Aho, Sethi, Ullman, Ch. 10 • An introduction to data-flow analysis • “Program Analysis” by Nielsen, Nielsen, Hankin • An excellent reference for advanced students • “Model Checking” by Clarke, Grumberg, Peled • Course material on the web • Classes taught by professors • My class (there are better ones, of course): www.cs.rpi.edu/~milanova/csci6961/lectures/
Using and Developing Program Analysis Mary Lou Soffa University of Virginia
About Mary Lou Soffa Confused about what I wanted to be • Ph.D. programs: • Mathematics, Sociology; Philosophy; Environmental Acoustics: disenchanted • Found what I really loved – computer science • After 25+ years at Pitt, moved to UVA • Small farm – grow “crops”; love my tractor • Passion – increasing the participation of women and minorities in computer science • Professional achievement – 24 Ph.D. students; ½ are women.
Program analysis • How to apply program analysis in your research • What are questions and what do you have to do
Solve a problem Program behavior static or dynamic Determine information needed What parts of program are involved Develop appropriate representation Develop analysis Develop algorithm
Have a goal – program code • Problem • Improve performance • Understand program • Find errors • Locate cause of errors • Need to collect information about the program that helps you infer properties of program • Static or dynamic code
Determine information needed • What questions are you asking • What do you need to gather to answer questions • Examples: • Statements needed to compute an expression • Values are always constant at a particular program point • Locations of dead statement • Branches that are correlated
Example: redundancy • Remove redundancies with goal of improving performance – • Redundant redundant expressions • Redundant loads • Redundant stores • Dead code • Static Remove redundant expressions from program representation
Redundant expressions • Does the value need to be computed for correct semantics? X := A * B F := C + E C := C + 1 If (cond) then R := A * B; S := C+ E Else X := A * B; A := 6 End if G= A*B
What parts of program involved • Given information you need, what parts of program are involved • Examples: • branches and statements that change values in conditional • all possible execution paths • Array definitions and uses • Types • Loops
Example: Redundant expressions • Expressions • Definitions • Control flow among definitions and expressions • Program paths
Program representation • Program representation that enables collection of information • Granularity • Source, intermediate, binary • Issues: how to get representation from another representation
Example: redundant expressions • Want to know how expressions flow • Is the value of an expression same as when expression used again • Need control flow graph with statements in nodes – intermediate level • X := A + B
Available Expressions Control flow graph X := A * B F := C + E C := C + 1 R := A * B S := C+ E X := A * B A := 6 G := A*B
Formulate analysis over representation • How to gather information from representation • How many analyses • Direction of flow of analysis • Along all paths or any path • Local solution • Global solution
Example: Redundant expressions • Local - basic block – single entry/exit • What expressions are generated • What expressions are “killed” by a definition • Global Flow over flow graph • Forward flow • Must be true on all paths
Redundant Expressions Control flow graph X := A * B F := C + E C := C + 1 {A * B} { A * B} { A * B} R := A * B S := C+ E X := A * B A := 6 { A * B, C+E} G := A * B
Develop analyses • Data flow equations – use data flow framework • Algorithm • Preciseness • Expense
Data flow equations • Gen (B) = all expressions • Kill (B) = all definitions – kill all incoming available expression • Out(B) = Gen(B) (IN(B) – Kill(B)) • In(B) = Out(j)
Dynamic Optimization • Static optimizations • Apply before execution • Dynamic Optimizations • Apply during execution – redundancy expressions • Binary code • Program traces
1. A = 4 2. T1 = A*B 3. L1: T2 = T1/C 4.if T2<W go to L2 5. M = T1*K 6. T3 = M + 1 7. L2: H = I 8. M = T3-H 9. If 3 > 0 go to L3 B1 10. go to L1 B2 B4 B3 11. L3:halt B6 B5 B1 1. A = 4 2. T1 = A*B 3. L1: T2 = T1/C 4. If T2 < W go to L2 5. M = T1 * K 6. T3 = M + 1 7. L2: H = I 8. M = T3 - H 9. If T3 > 0 go to L3 10. Go to L1 11. L3: halt B2 B3 B4 B5 B6
Program Trace Binary code A = 4 T1 = A*B T2 = T1/C If T2 !< W jump out H = I M = T3 - H If T3 > 0 go to L3 T2 = T1/C If T2 !< W jump out M = T1 * K T3 = M + 1 H = I M = T3 - H halt
Dynamic optimization Note: Single entry; multiple exits No Loops Need to Representation – bring up a level from binary code
Applying optimizations • Not as complicated • But, cannot tolerate much overhead • Phases in static • Developed algorithm that can apply multiple optimizations • Demand driven • Limit study of dynamic optimizations
Conclusion • Need analysis in many different applications • Virtual execution enviroments • Multicore • Wireless sensor networks • Testing • Testing for wireless sensor networks • Testing for security
Lori’s Journey Science/Math love: Started in chemistry at liberal arts college. Field Trip and first cs course -> CS major. Advisor’s strong push for grad school -> U Pitt. Took compilers course fromMary Lou -> PhD in compiler optimization. Big year: 10/85-married Mark. 1/86-started at Rice. 4/86-PhD Family: The yankees returned north 3 years later! University of Delaware: 15+ yrs. Visiting, Assistant, Associate, Full Family: Lauren (HS senior), Lindsay (16 and driving), Matt (11) Support: Mark, Mark, Mark,… Mary Lou, Errol, Sandee, CRA-W Currently: software tools, testing, compiler optimization
Identifying and Building Infrastructure for Analysis Research • What kinds of infrastructure do you need? • How to identify and build infrastructure • Examples
What kinds of infrastructure do you need? Analysis Research and Evaluation People Analysis Framework Software Labspace Hardware Workloads
Identifying Analysis Framework Software - Short term - Long term Determine Goals - Needed - Desired (Prioritized) Specify Requirements - Peers/Experts - Technical papers - Internet search Search for Possibilities Try Them Out - Install + Run Tests - Read docs - Examine code - Try small task Weigh Choices - Meet Requirements? - Ease of Use/Change?...
Example: Identifying Analysis Framework Software Evaluate new analysis on Java On its own and in client tool Determine Goals - Needed: call graph, cfg, chg Realistic environment/apps Easy to extend/build client tools Specify Requirements - Common environment is IDE, Java. Eclipse platform Search for Possibilities Try Them Out - Install + explore - Write a small plugin - Use call graph, chg, cfg for small task Weigh Choices - Learning curve vs Available analyses, realism
Implementing Your Analysis • Once you have decided on an infrastructure: • Think Reuse!! Think modularity!! • Think prototype, but extensible and scalable • Test, test, test - try to be systematic • Debug – not easy
Example: Implementing My NL Analysis • Build small modular components -> reuse • Analyzing method signatures to extract NL • Building program representation for NL • Traversing program rep • Building program rep for IR • Design reps to avoid loss of info -> reuse • Id’s and their roles and locations in code • Verb, Direct object rep -> extensible
Managing the Evolving Software Infrastructure • Managing change over time and people • CVS, subversion • Tracking tasks, bugs, deadlines/goals • TRAC, bugzilla, gforge • Maintaining documentation • JavaDocs, Doxygen • Testing, testing, testing • Unit, system, regression -- test suites Sounds like software engineering…
Selecting Appropriate Hardware - Short term - Long term Determine Goals - Needed - Desired (Prioritized) Specify Requirements Search for Possibilities - Peers/Experts - System Staff Weigh Choices - Meet Requirements? - Costs within budget? - Need to ask for money?