330 likes | 509 Views
Modeling The Heap Optimization and Software Engineering. Mark Marron IMDEA-Software (Madrid, Spain) mark.marron@imdea.org. Motivation. Many optimization and software engineering applications utilize heap information Optimization Parallelization Memory management
E N D
Modeling The HeapOptimization and Software Engineering Mark Marron IMDEA-Software (Madrid, Spain) mark.marron@imdea.org
Motivation • Many optimization and software engineering applications utilize heap information • Optimization • Parallelization • Memory management • Software Engineering and Debugging • Interactive debugging of heap data structures • Tainted or Information Flow analysis through heap objects • Existing heap analysis techniques often inapplicable due to imprecision (points-to style) or computational cost (shape analysis)
Objective • General purpose model capable of supporting these applications: • Must model the range of fundamental properties needed by our target application domains • Cannot place significant restrictions on the program being analyzed • Must be computationally efficient and compact • Willing to sacrifice some precision • Our focus is on providing general classes of information for others to build on
Desired Information (Structural) • Connectivity • Reachability • Interference • Paths • Logical data structures (Regions) • Group related sections of the heap • Keep unrelated sections of the heap separate • Shape of a region • Cycle, Dag, Tree, List, Singleton
Desired Information (Intrinsic) • Identity • Given an object at point p, track the flow of this object at all later program points q • Heap Based Use-Mod • Find all program points a given memory location may be read/written at • Escape • Objects that are freshly allocated • Objects that escape the local call context
Model Extraction (Static) • The theory of Abstract Interpretation provides framework for static program analysis • Takes a lattice (set) of abstract models, each of which represents a set of concrete program states • Computes, for each program point, an abstract model that represents all possible heap states that may occur at the program point
Model Extraction (Dynamic) • A surprising benefit of building a model suitable for abstract interpretation that is the model also works for dynamic analysis: • Debugging • Specification mining/checking • Given a snapshot of the single current program heap compute the corresponding abstract model
Current Status • Handle a large fragment of Java 1.5 and commonly used libraries (lang, util, io) • Precisely model (in static and dynamic analyses) the properties of interest • Can efficiently (on the order of seconds) statically analyze moderate sized programs (~15KLOC to date) • Have simple implementation of debugger and specification miner (a few seconds to compute models of Multi-MB heaps)
Model Overview • Base on storage shape graph • Nodes represent sets of objects (or recursive data structures), edges represent sets of pointers • Has natural representation for many of the properties we are interested in • Easy to visualize • Efficient to compute with • Annotate nodes and edges with additional instrumentation properties
Logical Structure Identification • Key issue in shape graph is how to pick nodes that abstract concrete objects • Too many nodes is confusing and computationally expensive • Too few nodes leads to imprecision (as a single node must represent multiple logical structures) • Often done via allocation site or types • Solution: nodes are related sets of objects • Recursive type information (recursive vs. non-recursive types) • Objects stored in the same collection, array or structure
Layout • Most general way objects in a region are connected • (S)ingleton: no pointers between any objects • (L)ist: may contain a linear List or simpler structures • (T)ree: may contain a Tree or simpler structures • (D)ag: may contain a Dag or simpler structures • (C)ycle: may a cyclic or simpler structures • E.g. A region with a (T)ree layout may contain tree, list or singleton structures, but no dag or cyclic structures.
Sharing • Edges abstract sets of references (variable references or pointers) • Heap Graph has ability to track some sharing properties but insufficiently precise to model many important properties • E.g. given an array of objects does any object appear multiple times? • May occur between references abstracted by same edge or two different edges • Interference: abstracted by same edge • Connectivity: abstracted by different edges
Interference • Does a singleedge abstract only references with disjoint targets or may some of these references alias/related? • Edge e is: • non-interfering: all pairs of references r1, r2 in γ(e) must be unrelated (refer to disjoint data structures). • interfering: may be a pair ofreferences r1, r2 in γ(e) that are related (refer to the same data structure).
Connectivity • Connectivity: Do twoedges abstract sets of references with disjoint targets or do some of these references alias/related? • Edges e1, e2 are: • disjoint: all pairs of references r1 in γ(e1), r2 in γ(e2) are unrelated (refer to disjoint data structures). • connected: may be pair of references r1 in γ(e1), r2 in γ(e2) that are related (refer to the same data structure).
Intrinsic Properties • Object Identity • Across each method call track how data structures are split, merged, reconnected • Field Sensitive Use/Mod • For each method track the fields for the objects in each region (node) and if the field is used/modified in the method • At each line track which regions (nodes) and fields may be used modified • Object Allocation • Track which objects are allocated in this scope and which may escape
Identity and Read/Write 1 void swap(Pair p) { 2 Data temp = p.first; 3 p.first = p.second; 4 p.second = temp; 5 }
Case Study: BH (Barnes-Hut) • N-Body simulation in 3-dimensions • Uses Fast Multi-Pole method with space decomposition tree • For nearby bodies use naive n2 algorithm • For distant bodies compute center of mass of many bodies and treat as single point mass • Updates space decomposition tree to account for body motion • Has not been analyzed with other existing (precise) heap analysis methods
BH Optimizations Memory • Inline Double[] into MathVector objects, 23% serial speedup 37% memory use reduction
Body Force Calculation Loop Iterator b = this.bodyTabRev.iterator(); while(b.hasNext()) ((Body) b.next()).hackGravity(rsize, root);
BH Optimizations TLP • TLP update loop over bodyTabRev, factor 3.09 speedup on quad-core machine
Summary • Have the core of a practical analysis system • Performance: • Analyze moderate size non-trivial Java programs • 15KLoc programs in a 114 seconds using ~120MB of memory (average 2 contexts per method) • Debugging abstraction efficiently compresses large heaps to compact abstract representation • Accuracy: • Precisely represent connectivity, sharing, shape properties + region, frame, and dependence information • Qualitatively Useful • Used results in multiple optimization domains and in debugging applications
Current and Future Work • Currently working on transforming core concepts from prototype to robust tools • Implementing static analysis for MSIL bytecode + core libraries • Implementing full featured debugger support and specification mining (for both MSIL and Java) • Enrich the model • Wider range of properties (what is useful in general) • Allow user to easily extend with new properties • Apply information in more client applications • Additional optimization domains • Support for programmer assisted refactorings
Interpreter Benchmark • Simple interpreter and debug environment for large subset of Java language • 14,000+ Loc (in normalized form), 90 Classes • Additional 1500 Loc for specialized standard library handling stubs • Large recursive call structures, large inheritance trees with numerous virtual method implementations • Wide range of data structure types, extensive use of java.util collections, uses both shared and unshared structures