210 likes | 298 Views
Shape Analysis With Reference Sets. Mark Marron IMDEA-Software (Madrid, Spain) mark.marron@imdea.org. Motivation. We want to provide basic information about the program heap for supporting a range of client applications IDE tools (query, refactoring, etc.) Optimization Error Detection
E N D
Shape Analysis With Reference Sets Mark Marron IMDEA-Software (Madrid, Spain) mark.marron@imdea.org
Motivation • We want to provide basic information about the program heap for supporting a range of client applications • IDE tools (query, refactoring, etc.) • Optimization • Error Detection • Focus on scalable, manageable models/tools even at cost of overall expressivity/analytic power
Demo • Fix sharing info extraction • Add disjoint/overlaps for set information • Point out, more than just variable relations is desirable, variables transient
Goal • Track basic set relations • Membership, Overlapping, Non-Overlapping • Subset, Set Equality • Ensure small computational cost • High precision is not required but must handle common cases accurately • Iterative subset construction/mutation • Set style library operations • Union (AddAll) • Intersection • IsSubset • Contains
Approach Overview • Start with existing model that decomposes heap into related regions • Reduces the complexity of the set formula that are needed • Storage shape graph works well • Nodes represent sets of objects (or data structures), edges represent sets of pointers • Fine grained partitioning is possible • Disjointness properties are natural (and mostly free) • Annotate edges with additional properties to track reference set relations
Logical Structure Identification • Key issue for shape graph approach is how to group concrete objects into abstract nodes • Too many nodes is confusing and computationally expensive • Too few nodes leads to imprecision (as a single node must represent multiple logical structures) • Often done via allocation site or types • Solution: nodes are similar sets of objects • Recursive type information (recursive vs. non-recursive types) • Objects stored in the same collection, array or structure
Target Set Definition • Given a set of heap references R the corresponding target set is: • {Object o | ∃ r ∈ R that points to o} • The two sets of heap references can be related with ⊆ on the target sets • As the heap is partitioned into regions of objects we also define a notion of coverage • A reference set covers a region if every object in the region is in the corresponding target set
Considerations in Abstraction • Several possible choices for representing these relations • Theory of sets over all objects/references • Full binary relations on power sets of edges • Reduced set of relations • For efficiency we use a reduced set of relations • Equality of the reference sets abstracted by pairs of edges (E × E) • Relation from sets of edges to nodes that are covered by the abstracted references (℘(E) × N)
Abstract Edge Equivalence • Track target set equality of the pointers abstracted by pairs of edges
Abstract Node Coverage • Track if all nodes in region are contained in the target sets of given edges
Useful Inferences • There are a number of useful inferences that can be made from these two properties • If e, eʹ are edge equivalent and e has an empty concretization then eʹ must have an empty concretization as well • If an edge e covers node n then any other in edge represents a target set that is ⊆ to the target set for edge e
Subsumes Aliasing • Note that the proposed reference set relations subsume classic must-alias • In the concrete model variables x == y (x, y non-null) iff Target(x) = Target(y) • In the abstract model the variables x, y must-alias iff the corresponding edges ex and ey are edge equivalent
Loop Invariant With Exit Test ... for(int i = 0; i < V.Length; ++i) V[i].f = 0;
Result ... for(int i = 0; i < V.Length; ++i) V[i].f = 0;
Summary • Tracking reference set information is computationally inexpensive • Results are precise enough to model many interesting/important relations • In fact surprisingly so • Why? Most conditions end up being simple • Is this a general property? Are most programs made of simple relations/concepts which are composed into complex concepts (we hope so) • Could we use rich set decision procedures, e.g. all conditions are simple ⇒ most proofs easy/fast with right decomposition
Future Work • Build strong foundation for other tools to utilize • Transform core concepts from prototype to robust tools • Finish implementation of static analysis for CLI bytecode + core libraries (also runtime support) • Export results to Visual Studio for inspection, spec. generation, or other tools • Apply results in optimization, refactoring, and error detection applications