580 likes | 592 Views
This paper discusses the use of CHET, a tool for automatically checking component specifications in Java systems, to improve the reliability, security, and robustness of programming. The tool uses a component specification language and flow analysis techniques to find instances of component usage and check their validity. By creating a model program per instance and using model checking techniques, CHET efficiently checks specifications for each instance. The tool is practical for everyday programming, as it focuses on tracking component usage through control and data flow, making it simpler and less prone to errors.
E N D
Efficient Checking of Component Specifications in Java Systems Steven P. Reiss Brown University CHET
Our Goal • To Improve Programming • More reliable • More secure • More robust • More understandable • Easier • To Deal With Real Systems • Not yesterday’s • Some today’s • Worrying about tomorrow’s CHET
Model Checking • Is the next great thing for programmers • Will find all our bugs automatically • Will fix all our problems • But with minor exceptions it is not used • Not on an everyday basis • Not for everyday programs • Not by most programmers • What is needed here • Must be “automatic” -- no effort required • Must be fast -- “compilation speed” • Must be helpful -- accurate, precise CHET
The Problem of Components • Java programs are built on class libraries • Standard java libraries • Open source libraries • Libraries created for an application • Creators know how they should be used • Each has its own pattern of usage • Typically fail if not used correctly • Make sure they are used correctly • Throughout the program • Each instance • Statically • With “real” Java programs CHET
The Solution Create a Component Specification Language Find Instances of Component Usage Check Each Instance for Validity CHET
Specification Language • Define how components should be used • In a way that matches their use • Once for all potential instances • So that it can be done by programmers • And the specification can be understood • Solution • Use finite automata • Over parameterized program events • Matches call sequences, variable usage, etc. CHET
Specification Instances • Components are used multiple times • List, Iterator, XmlWriter, … • Need to handle each use separately • Uses must be found automatically • As specific as possible (statically) • Solution • Using flow analysis over the class files • Trigger events define instances • Other events used in particular instances CHET
Checking Specifications • Each instance must be checked • Independently • To ensure the specification is met • Solution • Create a simple model program per instance • Check if model program meets specification • Using model checking techniques • Do all this efficiently CHET
Keeping this Practical • Most components are used through calls • Control flow determines call sequences • Data flow determines which calls • Most component usage is single threaded • Can often ignore thread interactions • This is simpler than the general problem • Need to track fewer variables • Need to worry less about variable values • Need to worry less about interweaving CHET
CHET Overview Specifications Application Flow Analysis Instances Abstract Program Builder Program Checker Report CHET
Iterator Usage CHET
Xml Writer Usage CHET
File Open-Close CHET
Catching Errors CHET
Web Crawler Library CHET
Nested Locks CHET
Events & Parameters • CALL (caller this, argi, calling this) • RETURN (this, return value) • ENTRY (this, argi) • FIELD (this) [set to int, null, nonnull] • ALLOC (new object) • CATCH (catch object) • THROW (throw object) • LOCK (lock object) • UNLOCK (lock object) CHET
Why Event-Based Specification • ESP and others use code patterns • These are closer to programs • And hence easier to understand • However they are hard to generalize • Iterator can use nextElement or next • Nested opens and alternatives • Xml writer alternatives • Events and automata generalize • Easy to define abstract patterns • Still understandable by programmers CHET
Finding All Instances • Done using flow analysis • Of the program and its libraries • Handling all specifications at once • Each trigger event yields a source • We determine where this source can flow • This determines which events are relevant • To this particular instance • But its not that easy • Multiple-parameter events • Accurate flow and type analysis required CHET
Vector<A> v = … Iterator it = v.iterator(); while (it.hasNext()) { A x = it.next(); … } … for (it = v.iterator(); it.hasNext(); ) { A y = it.next(); … } Trigger Source Trigger Source Example CHET
Flow Analysis • Identify sources • From trigger events • Tracking sources and where they flow • Through symbolic execution • Result: Determine at each location • What sources are used • This lets us check event parameters • Trigger source used on call => event CHET
Flow Analysis Goals • Complete analysis • Ensure we track all possible uses a source • Must include libraries as well as user code • Accurate analysis • Must know types for virtual calls • Must understand full Java semantics • Must handle all methods (including native, etc.) CHET
Flow Analysis Techniques • Done at the byte code level • Tracking types and values • Through symbolic execution • Full Interprocedural flow analysis • Using a work queue approach • Of user code and libraries • Handling all the complexities of Java • Selectively context sensitive • Flow sensitive, not path sensitive • Tradeoff accuracy and speed • Accuracy where important, speed otherwise CHET
Flow Analysis Issues • Speed versus accuracy • Start with the minimum possible • Add more information to get needed accuracy • What to track • Trigger sources; all other sources • Java Issues that arose • Static initializers • Constructors • Native methods • Reflection • Callbacks • Data structures • Exceptions CHET
What to Track: Sources • Local Sources • Anything generated via a new operator • Track values stored in fields of the source • Array Sources • Created by new array operators • Track values stored in the array • Fixed Sources • Results from native methods, built-in values • Can be mutable (changed on a cast) CHET
Sources • Model Sources • Generated by trigger events • One-to-one association with instances • Field Sources • Track the values of fields • Only for fields used in specifications • Determine where the fields are used • Others • Privacy, … CHET
Values • Flow analysis deals with values • These are sets of sources • Associated with each field, local, stack, … • Value contains additional information • Data type (for type analysis) • CanBe or MustBe NULL flags • Integer value range (or indefinite) • Operations applied symbolically CHET
Static Initializers • Problem • Called implicitly at first use • Must return before class can be used • Accurate field analysis requires this • But it can call methods of the class • Some classes initialized by JVM • Solution • Track whether initializer has been started • Add some system classes by default • Don’t process methods before started CHET
Constructors • Problem • Most methods assume constructor done • Accurate field analysis requires this • But constructors can be quite complex • Solution • Track current set of constructors we are in • Only process method if • We have constructed an object of this class OR • We are called from within the constructor CHET
Native & Reflexive Methods • Problem • These are hidden from static analysis • Solution 1: Default handling • Use a fixed source of return type • Use mutable sources where appropriate • Solution 2: Internal Special handling • arraycopy : copy array values CHET
Native & Reflexive Methods • Solution 3: Resource-based return • User specifies return type in resource file • Can be specified as mutable • On a function basis • On a call-site basis • Solution 4: Method substitution • Resource file can specify alternative method • Thread.start => Thread.run • AccessController.doPrivileged => run CHET
Native and Reflexive Methods • Solution 5: Ignore • Resource file can specify calls to ignore • Most calls to swing, awt, … are black boxes • Can be done by method, class or package • With exceptions • Solution 6: User Substitution • User can provide alternative dummy method • Use it as the replacement method • Complex uses of reflection CHET
Callbacks • Problem • Some callbacks are hidden in native code • Callbacks need to have proper arguments • For accurate analysis • Lots of user code is through callbacks • Solution • Note callbacks in resource file • Associate callback method with registration • Provide calling sequence as well • Simulate callbacks with proper arguments • Automatically during analysis CHET
Data Structures • Problem • Maps, collections are hard to analyze • Expensive and inaccurate to look at code • Solution • Introduce prototype sources • With procedural models of methods • Simulate what the methods do in the source • Don’t use the method code per se • Extend to iterators, etc. based on prototypes CHET
Prototype Map • Tracks the contents of the map • Can track selective key-value pairs • Tracks empty, non-empty, either • Handles all the map operations • Updating internal contents • Returning appropriate values • Returns prototype iterators • That are aware of prototype contents CHET
Prototypes • Provide more accurate analysis • Know the type of items stored in table • Avoid merging of multiple tables • Know when tables are null and not • Provide more efficient analysis • Speed up of 30% • Are relatively easy to implement • Collections: < 900 lines of source • Maps: < 500 lines of source CHET
Exceptions • Problem • Normal exceptions are easy to handle • What to do with hidden exceptions • catch (Throwable …) • Synchronized regions • Solution • Restrict analysis to explicit exceptions • Unless explicitly told not to CHET
Finding One Instance • Trigger event => Model Source • This determines the basic instance • Where model source flows • Determines event locations • Based on event type • Based on event parameters CHET
E1 is the trigger Provides a model source M E2 occurs whenever M flows to a call to Iterator.hasNext E3, E4, E5 similarly Example CHET
Multi-Parameter Specifications • Find all possible instances (statically) • Start with model source for trigger • Find all locations for next NEW event • Based on flow of the model source • Build a new instance for the source pair • Continue to handle additional NEW sources • Note that we have to consider all sets • And not just complete sets CHET
E1 is the trigger Model source M Writer constructor call With M as arg1 Yields new source M1 Instance <M,M1> If additional call With M1 as arg1 Build new instance Example CHET
Where Are We • We have • Specified how components should be used • Using parameterized automata • Found all instances of each specification • Using detailed flow analysis • Next we need to • Check each instance • By creating a model program • And looking at all its possible executions CHET
Checking an Instance • Build an abstract program for each instance • Using flow-sensitive analysis • Abstract program organized into routines • Abstract program generates event sequences • Some nodes output events • Determine all event sequences that can be generated • Ensure that they are all valid wrt specification CHET
Abstract Programs • Methods represented by automata • Each defined as a directed graphs • Nodes of the graph represent actions • Arcs represent nondeterministic traversal • Control flow embedded in nodes • Calls, asynchronous calls • Actions can do tests (on variables, returns) • Actions can dead-end • If is represented as two test nodes CHET
Sample Program CHET
Sample Conditional CHET
Abstract Program Actions • Enter a routine • Exit a routine • Call a routine • Generate a particular event • Set a variable to a given value • Correspond to program variables • Set the return value of the routine • Test a variable or return value for a value • Exit (call to System.exit) • Asynchronous call of a routine • Begin synchronized region • End synchronized region • (Wait, Notify) events CHET
Abstract Program Variables • Which variables are used in the program • Can be given as part of the specification • Otherwise determined automatically • Using a separate cursory flow analysis • Determine which fields directly affect event generation in the abstract program • Conditional using field branches around event • This is done before building the program CHET
Simplifying Abstract Programs • Simplification essential for fast checking • Eliminate routines obviously not used • Through a quick transitive closure check • Then apply FSA minimization techniques • Throw away nodes with no effects • Combine nodes where possible • No effects • If no thread starts, then all thread operations • If no conditionals for a variable (return), no sets • Conditional without internal nodes • Enter-exit only for a routine • Call of empty routine CHET