440 likes | 538 Views
End-User Shape Analysis. Bor-Yuh Evan Chang 張博聿 U of Colorado, Boulder. Xavier Rival INRIA/ENS Paris. George C. Necula U of California, Berkeley. National Taiwan University – August 11, 2009.
E N D
End-User Shape Analysis Bor-Yuh Evan Chang張博聿 U of Colorado,Boulder Xavier Rival INRIA/ENSParis George C. Necula U of California, Berkeley National Taiwan University – August 11, 2009 If some of the symbols are garbled, try either installing TexPoint (http://texpoint.necula.org) or the TeX fonts (http://www.cs.colorado.edu/~bec/texpoint-fonts.zip).
Programming Languages Research at the University of Colorado, Boulder
Software errors cost a lot ~$60 billion annually (~0.5% of US GDP) • 2002 National Institute of Standards and Technology report > total annual revenue of > 10x annual budget of Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
But there’s hope in program analysis Microsoft uses and distributes the Static Driver Verifier Airbus applies the Astrée Static Analyzer Companies, such as Coverity and Fortify, market static source code analysis tools Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Because program analysis caneliminate entire classes of bugs For example, • Reading from a closed file: • Reacquiring a locked lock: How? • Systematically examine the program • Simulate running program on “all inputs” • “Automated code review” acquire( ); read( ); Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Program analysis by example:Checking for double acquires Simulate running program on “all inputs” …code … // x now points to an unlocked lock acquire(x); • … code … • acquire(x); • … code … analysis state x Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Program analysis by example:Checking for double acquires Simulate running program on “all inputs” …code … • // x now points to an unlocked lock in a linked list acquire(x); • … code … ideal analysis state x x x … or or or undecidability Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Must abstract Abstraction too coarse or not precise enough (e.g., lost x is always unlocked) …code … • // x now points to an unlocked lock in a linked list acquire(x); • … code … ? ideal analysis state analysis state x x x … or or or x For decidability, must abstract—“model all inputs” (e.g., merge objects) mislabels good code as buggy Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
To address the precision challenge Traditional program analysis mentality: “ Why can’t developers write more specifications for our analysis? Then, we could verify so much more.” “ Since developers won’t write specifications, we will use default abstractions (perhaps coarse) that work hopefully most of the time.” End-user approach: “ Can we design program analyses around the user? Developers write testing code. Can we adapt the analysis to use those as specifications?” Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Summary of overview Challenge in analysis: Finding a good abstraction precise enough but not more than necessary Powerful, generic abstractions expensive, hard to use and understand Built-in, default abstractions often not precise enough (e.g., data structures) End-user approach: Must involve the user in abstraction without expecting the user to be a program analysis expert Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Overview of contributions Extensible Inductive Shape Analysis (Xisa) Precise inference of data structure properties Able to check, for instance, the locking example Targeted to software developers Uses data structure checking code for guidance • Turns testing code into a specification for static analysis Efficient ~10-100x speed-up over generic approaches • Builds abstraction out of developer-supplied checking code Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Extensible InductiveShape Analysis End-user approach Precise inference of data structure properties …
Shape analysis is a fundamental analysis Data structures are at the core of • Traditional languages (C, C++, Java) • Emerging web scripting languages Improves verifiers that try to • Eliminate resource usage bugs (locks, file handles) • Eliminate memory errors (leaks, dangling pointers) • Eliminate concurrency errors (data races) • Validate developer assertions Enables program transformations • Compile-time garbage collection • Data structure refactorings … Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
// l is a sorted doubly-linked list for each node cur in list l { remove cur if duplicate; } assert l is sorted, doubly-linked with no duplicates; Shape analysis by example:Removing duplicates Example/Testing Code Review/Static Analysis program-specific l intermediate state more complicated “segment with no duplicates” “sorted dl list” “sorted dl list” “no duplicates” l l cur cur 2 2 2 4 4 4 2 4 4 l l l Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Shape analysis is not yet practical Choosing the heap abstraction difficult for precision Some representative approaches: Parametric in low-level, analyzer-oriented predicates + Very general and expressive -Harder for non-expert TVLA [Sagiv et al.] 89 • Built-in high-level predicates • -Harder to extend • + No additional user effort (if precise enough) Space Invader [Distefano et al.] End-user approach: Parametric in high-level, developer-oriented predicates + Extensible + Targeted at developers Xisa Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Our approach: Executable specifications Utilize “run-time checking code” as specification for static analysis. Contribution: Build the abstraction for analysis out of developer-specified checking code • h.dll(p) = • if (h =null) then • true • else • h!prev= p and h!next.dll(h) • assert(sorted_dll(l,…)); for each nodecurinlistl { removecurif duplicate; } • assert(sorted_dll_nodup(l,…)); l l Contribution: Automatically generalize checkers for complicated intermediate states checker l • p specifies where prev should point cur Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Xisa is … An automated shape analysis with a precise memory abstraction based around invariant checkers. • Extensible and targeted for developers • Parametric in developer-supplied checkers—viewed as inductive definitions in separation logic • Precise yet compact abstraction for efficiency • Data structure-specific based on properties of interest to the developer • h.dll(p) = • if (h =null) then • true • else • h!prev=prevand • h!next.dll(h) checkers Xisa Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Shape analysis is an abstract interpretation on abstract memory descriptions with … • Splitting of summaries • To reflect updates precisely • And summarizing for termination l l l l l cur cur cur cur cur cur l Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Roadmap: Components of Xisa Learn information about the checker to use it as an abstraction Xisa shape analyzer abstract interpretation 1 splitting and interpreting update 2 level-type inference on checker definitions • h.dll(p) = • if (h =null) then • true • else • h!prev=prevand • h!next.dll(h) 3 Compare and contrast manual code review and our automated shape analysis summarizing checkers Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Overview: Split summariesto interpret updates precisely Want abstract update to be “exact”, that is, to update one “concrete memory cell”. The example at a high-level: iterate using cur changing the doubly-linked list from purple to red. Challenge: How does the analysis “split” summaries and know where to “split”? split at cur l update cur purple to red cur cur cur cur l l l Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
“Split forward”by unfolding inductive definition dll(cur, p) l p l null get: cur!next Analysis doesn’t forget the empty case Ç • h.dll(p) = • if(h =null) then • true • else • h!prev= p and h!next.dll(h) cur cur cur dll(n, cur) l p n Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
“Split backward” also possible and necessary cur!prev!next = cur!next; “dll segment” dll(n, cur) l p n for each node cur in list l { remove cur if duplicate; } assert l is sorted, doubly-linked with no duplicates; • Technical Details: • How does the analysis do this unfolding? • Why is this unfolding allowed? • (Key: Segments are also inductively defined) [POPL’08] How does the analysis know to do this unfolding? get: cur!prev!next Ç dll(n, cur) l n • h.dll(p) = • if (h =null) then • true • else • h!prev= p and h!next.dll(h) null “dll segment” dll(n, cur) l p0 n cur cur cur Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Roadmap: Components of Xisa Derives additional information to guide unfolding How do we decide where to unfold? Xisa shape analyzer abstract interpretation 1 splitting and interpreting update 2 level-type inference on checker definitions • h.dll(p) = • if (h =null) then • true • else • h!prev=prevand • h!next.dll(h) 3 summarizing checkers Contribution: Turns testing code into specification for static analysis … to be discussed this afternoon Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Summary of interpreting updates Splitting of summaries needed for precision Unfolding checkers is a natural way to do splitting When checker traversal matches code traversal Checker parameter type analysis Useful for guiding unfolding in difficult cases, for example, “back pointer” traversals Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Results: Performance Times negligible for data structure operations (often in sec or 1/10 sec) Expressiveness: Different data structures TVLA: 290 ms Space Invader only analyzes lists (built-in) TVLA: 850 ms Verified shape invariant as given by the checker is preserved across the operation. Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Demo: Doubly-linked list reversal Body of loop over the elements: Swaps the next and prev fields of curr. Already reversed segment Node whose next and prev fields were swapped Not yet reversed list http://www.cs.colorado.edu/~bec/ Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Experience with the tool Checkers are easy to write and try out • Enlightening (e.g., red-black tree checker in 6 lines) • Harder to “reverse engineer” for someone else’s code • Default checkers based on types useful Future expressiveness and usability improvements • Pointer arithmetic and arrays (in progress) • More generic checkers: polymorphic “element kind unspecified” higher-order parameterized by other predicates Future evaluation: user study Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Near-term future work:Exploiting common specification framework Scenario: Code instrumented with lots of checker calls (perhaps automatically with object invariants) assert( mychecker(x) ); // … operation on x … assert( mychecker(x) ); • Very slow to execute • Hard to prove statically (in general) • Can we prove parts statically? Static Analysis View: Hybrid checking Testing View:Incrementalize invariant checking Example: Insert in a sorted list v u w Preservation of sortedness shown statically l Emit run-time check for new element: u·v·w Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Conclusion Extensible Inductive Shape Analysis precision demanding program analysis improved by novel user interaction Developer: Gets results corresponding to intuition Analysis: Focused on what’s important to the developer Practical precise tools for better software with an end-user approach! Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Programming Languages Research at the University of Colorado, Boulder
Who we are Faculty Ph.D. Students AmerDiwan Jeremy Siek Bor-Yuh Evan Chang SriramSankaranarayanan Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
Outline • Gradual Programming • A new collaborative project involving AmerDiwan, Jeremy Siek, and myself • Brief Sketches of Other Activities Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
Have you noticed a time where your program is not optimized where you expect? Observation: A disconnect between programmer intent and program meaning “I need a map data structure” Load class file Run class initialization Create hashtable semantic gap Problem: Tools (IDEs, checkers, optimizers) have no knowledge of what the programmer cares about … hampering programmer productivity, software reliability, and execution efficiency Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
Example: Iteration Order class OpenArrayextends Object { private Double data[]; public booleancontains(Object lookFor) { for (i = 0; i < data.length; i++) { if (data[i].equals(lookFor)) return true; } return false; } } Must specify an iteration order even when it should not matter class OpenArrayextends Object { private Double data[]; public booleancontains(Object lookFor) { for (i = 0; i< data.length; i++) { if (data[i].equals(lookFor)) return true; } return false; } } Compiler cannot choose a different iteration order (e.g., parallel) Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
Wild and Crazy Idea: Use Non-Determinism • Programmer starts with a potentially non-deterministic program • Analysis identifies instances of “under-determinedness” • Programmer eliminates “under-determinedness” “over-determined” just right Question: What does this mean? Is it “under-determined”? Response: Depends, is the iteration order important? class OpenArrayextendsObject { private Double data[]; public booleancontains(Object lookFor) { • for (i = 0; i < data.length; i++) { • if (data[i].equals(lookFor)) return true; • } return false; } } class OpenArrayextends Object { private Double data[]; public booleancontains(Object lookFor) { • i 0 .. data.length-1 { • if (data[i].equals(lookFor)) return true; • } return false; } } starting point “under-determined” Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
Let’s try a few program variants • public booleancontains(Object lookFor) { • for(i = 0; i < data.length; i++) { • if(data[i].equals(lookFor)) return true; } • return false; • } Do they compute the same result? Approach: Try to verify equivalence of program variants up to a specification • public booleancontains(Object lookFor) { • for(i = data.length-1; i >= 0; i--) { • if(data[i].equals(lookFor)) return true; } • return false; • } Yes Pick any one No Ask user • public booleancontains(Object lookFor) { • parallel_for(0, data.length-1) i => { • if(data[i].equals(lookFor)) return true; } • return false; • } What about here? Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
Surprisingly, analysis says no. Why? Exceptions! Need user interaction to refine specification that captures programmer intent a.data= null a.contains( ) left-to-right iteration returns true right-to-left iteration throws NullPointerException Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
Proposal Summary • “Fix semantics per program”: Abstract constructs with many possible concrete implementations • Apply program analysis to find inconsistent implementations • Interact with the user to refine the specification • Language designer role can enumerate the possible implementations Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
Bridging the Semantic Gap “I need a map data structure” “Looks like iterator order matters for your program” “Yes, I need iteration in sorted order” “Let’s use a balanced binary tree (TreeMap)” Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
Formal Methods Prof. SriramSankaranarayanan(CS) Cyber-physical systems verification • hybrid automata theory, control systems verification, analysis of Simulink and Stateflow diagrams • advanced mathematical techniques: • convex optimization: linear and semi-definite • differential equations: set-valued analysis • SMT solvers over non-linear theories • applications to automotive software (with NEC labs and GM labs) • Prof. Aaron Bradley (ECEE) • Decisionprocedures, Model checking • Prof. Fabio Somenzi(ECEE) Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
Programming Languages and Analysis Prof. AmerDiwan(CS) Performance analysis of computer systems How do we know that we have not perturbed our data? Using machine learning and statistical techniques to reason about data Tool-assisted program transformations Algorithmic optimizations for performance Program metamorphosis for improving code quality Prof. Jeremy Siek(ECEE/CS) Gradual type checking: static (Java) dynamic (Python) Meta-programming: programs that write programs Compilers for optimizing scientific codes Prof. Bor-Yuh Evan Chang (CS) End-user program analysis Precise analysis (shape, collections) Interactive analysis refinement (type checking + symbolic evaluation) Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
Applying to Colorado • Computer Science Department information • Deadlines • Graduate Advisor: Nicholas Vocatura • Talk to me about application fee waiver http://www.cs.colorado.edu/grad/admission/ Dec 1 for Fall (Sep 1 for Spring) nicholas.vocatura@colorado.edu http://www.cs.colorado.edu/~bec/ Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder