Modeling The Heap Optimization and Software Engineering

Modeling The HeapOptimization and Software Engineering Mark Marron IMDEA-Software (Madrid, Spain) mark.marron@imdea.org

Motivation • Many optimization and software engineering applications utilize heap information • Optimization • Parallelization • Memory management • Software Engineering and Debugging • Interactive debugging of heap data structures • Tainted or Information Flow analysis through heap objects • Existing heap analysis techniques often inapplicable due to imprecision (points-to style) or computational cost (shape analysis)

Objective • General purpose model capable of supporting these applications: • Must model the range of fundamental properties needed by our target application domains • Cannot place significant restrictions on the program being analyzed • Must be computationally efficient and compact • Willing to sacrifice some precision • Our focus is on providing general classes of information for others to build on

Desired Information (Structural) • Connectivity • Reachability • Interference • Paths • Logical data structures (Regions) • Group related sections of the heap • Keep unrelated sections of the heap separate • Shape of a region • Cycle, Dag, Tree, List, Singleton

Desired Information (Intrinsic) • Identity • Given an object at point p, track the flow of this object at all later program points q • Heap Based Use-Mod • Find all program points a given memory location may be read/written at • Escape • Objects that are freshly allocated • Objects that escape the local call context

Model Extraction (Static) • The theory of Abstract Interpretation provides framework for static program analysis • Takes a lattice (set) of abstract models, each of which represents a set of concrete program states • Computes, for each program point, an abstract model that represents all possible heap states that may occur at the program point

Model Extraction (Dynamic) • A surprising benefit of building a model suitable for abstract interpretation that is the model also works for dynamic analysis: • Debugging • Specification mining/checking • Given a snapshot of the single current program heap compute the corresponding abstract model

Current Status • Handle a large fragment of Java 1.5 and commonly used libraries (lang, util, io) • Precisely model (in static and dynamic analyses) the properties of interest • Can efficiently (on the order of seconds) statically analyze moderate sized programs (~15KLOC to date) • Have simple implementation of debugger and specification miner (a few seconds to compute models of Multi-MB heaps)

Model Overview • Base on storage shape graph • Nodes represent sets of objects (or recursive data structures), edges represent sets of pointers • Has natural representation for many of the properties we are interested in • Easy to visualize • Efficient to compute with • Annotate nodes and edges with additional instrumentation properties

Logical Structure Identification • Key issue in shape graph is how to pick nodes that abstract concrete objects • Too many nodes is confusing and computationally expensive • Too few nodes leads to imprecision (as a single node must represent multiple logical structures) • Often done via allocation site or types • Solution: nodes are related sets of objects • Recursive type information (recursive vs. non-recursive types) • Objects stored in the same collection, array or structure

Concrete Expression Heap

Abstract Expression Heap

Layout • Most general way objects in a region are connected • (S)ingleton: no pointers between any objects • (L)ist: may contain a linear List or simpler structures • (T)ree: may contain a Tree or simpler structures • (D)ag: may contain a Dag or simpler structures • (C)ycle: may a cyclic or simpler structures • E.g. A region with a (T)ree layout may contain tree, list or singleton structures, but no dag or cyclic structures.

Layout Example

Sharing • Edges abstract sets of references (variable references or pointers) • Heap Graph has ability to track some sharing properties but insufficiently precise to model many important properties • E.g. given an array of objects does any object appear multiple times? • May occur between references abstracted by same edge or two different edges • Interference: abstracted by same edge • Connectivity: abstracted by different edges

Interference • Does a singleedge abstract only references with disjoint targets or may some of these references alias/related? • Edge e is: • non-interfering: all pairs of references r1, r2 in γ(e) must be unrelated (refer to disjoint data structures). • interfering: may be a pair ofreferences r1, r2 in γ(e) that are related (refer to the same data structure).

Interference Example

Connectivity • Connectivity: Do twoedges abstract sets of references with disjoint targets or do some of these references alias/related? • Edges e1, e2 are: • disjoint: all pairs of references r1 in γ(e1), r2 in γ(e2) are unrelated (refer to disjoint data structures). • connected: may be pair of references r1 in γ(e1), r2 in γ(e2) that are related (refer to the same data structure).

Sharing Example

Intrinsic Properties • Object Identity • Across each method call track how data structures are split, merged, reconnected • Field Sensitive Use/Mod • For each method track the fields for the objects in each region (node) and if the field is used/modified in the method • At each line track which regions (nodes) and fields may be used modified • Object Allocation • Track which objects are allocated in this scope and which may escape

Identity and Read/Write 1 void swap(Pair p) { 2 Data temp = p.first; 3 p.first = p.second; 4 p.second = temp; 5 }

Case Study and Evaluation

Case Study: BH (Barnes-Hut) • N-Body simulation in 3-dimensions • Uses Fast Multi-Pole method with space decomposition tree • For nearby bodies use naive n2 algorithm • For distant bodies compute center of mass of many bodies and treat as single point mass • Updates space decomposition tree to account for body motion • Has not been analyzed with other existing (precise) heap analysis methods

Regions and Basic Structure

BH Optimizations Memory • Inline Double[] into MathVector objects, 23% serial speedup 37% memory use reduction

Body Force Calculation Loop Iterator b = this.bodyTabRev.iterator(); while(b.hasNext()) ((Body) b.next()).hackGravity(rsize, root);

BH Optimizations TLP • TLP update loop over bodyTabRev, factor 3.09 speedup on quad-core machine

Static Analysis Statistics

Dynamic Analysis Statistics

Summary • Have the core of a practical analysis system • Performance: • Analyze moderate size non-trivial Java programs • 15KLoc programs in a 114 seconds using ~120MB of memory (average 2 contexts per method) • Debugging abstraction efficiently compresses large heaps to compact abstract representation • Accuracy: • Precisely represent connectivity, sharing, shape properties + region, frame, and dependence information • Qualitatively Useful • Used results in multiple optimization domains and in debugging applications

Current and Future Work • Currently working on transforming core concepts from prototype to robust tools • Implementing static analysis for MSIL bytecode + core libraries • Implementing full featured debugger support and specification mining (for both MSIL and Java) • Enrich the model • Wider range of properties (what is useful in general) • Allow user to easily extend with new properties • Apply information in more client applications • Additional optimization domains • Support for programmer assisted refactorings

Interpreter Benchmark • Simple interpreter and debug environment for large subset of Java language • 14,000+ Loc (in normalized form), 90 Classes • Additional 1500 Loc for specialized standard library handling stubs • Large recursive call structures, large inheritance trees with numerous virtual method implementations • Wide range of data structure types, extensive use of java.util collections, uses both shared and unshared structures

Modeling The Heap Optimization and Software Engineering

Modeling The Heap Optimization and Software Engineering

Presentation Transcript

Engineering Optimization

Engineering Optimization

Engineering Optimization

Engineering Optimization

Engineering Optimization

Engineering Optimization

Engineering Optimization

4.4 Modeling and Optimization

Engineering Optimization

Software Engineering – Analysis Modeling

Heap And Heap Sort

Modeling and Optimization

4.4 Modeling and Optimization

Engineering Optimization

Engineering Optimization

Performance optimization on fish modeling software

Engineering Optimization

4.4 Modeling and Optimization

MODELING, SIMULATION AND OPTIMIZATION

Engineering Optimization

Modeling and Optimization