320 likes | 470 Views
Quantifying Uncertainty in Points-To Relations. University of Edinburgh http://www.homepages.inf.ed.ac.uk/mc/Projects/VESPA. Constantino Ribeiro and Marcelo Cintra. Contributions. Scope Measure and compare sizes of static vs. dynamic points-to sets from context- and flow-sensitive algorithm
E N D
Quantifying Uncertainty in Points-To Relations University of Edinburgh http://www.homepages.inf.ed.ac.uk/mc/Projects/VESPA Constantino Ribeiro and Marcelo Cintra
Contributions • Scope • Measure and compare sizes of static vs. dynamic points-to sets from context- and flow-sensitive algorithm • Goal • Quantification of may-alias behavior that is intrinsic to applications • Classification of reasons for difference between static prediction and run-time behavior • Relevance • Important step toward future aggressive (speculative) optimizations This work is not about a new pointer analysis algorithm LCPC 2006
Outline • Motivation • Pointer Analysis • Evaluation Methodology • Experimental Setup and Results • Related Work • Conclusions LCPC 2006
Compiler Optimizations • To make good optimizations a compiler must have accurate knowledge of: • Data flow: • Redundant variable elimination • Constant propagation • Register allocation • Control flow: • Dead code elimination • Instruction scheduling LCPC 2006
Data Flow Analysis • Data flow analysis: difficult to achieve 100% of precision • Use of pointers variables • Same pointer may refer to different memory objects at different times • Same pointer may refer to many memory objects at some program point • Use of procedures • Side effects caused by call by reference and access to global data • Presence of control flow structures • Multiple def-use chains LCPC 2006
Real Points-to Behavior • So we want to • Understand the points-to behavior in real applications • Discover the causes of the ambiguities from static analysis • Facilitate more aggressive optimizations for ambiguous points-to LCPC 2006
Outline • Motivation • Pointer Analysis • Evaluation Methodology • Experimental Setup and Results • Related Work • Conclusions LCPC 2006
Points-to analysis • Data Dependence Analysis for pointer variables • At each point of the program: set of pointer variables and the locations that they point to • Pointer variables may point to an address or to many addresses • Pointer variables can even point to other pointers • Many possible points-to targets restrict optimizations in conservative compilers • Procedures and their call increase complexity and time of the analysis 1 2 4 3 8 7 LCPC 2006
Types of Algorithms • Sensitivity: • Flow-sensitive + Context-sensitive → more precise analysis • Granularity: • Fine: individual fields of complex data structures • Coarse: whole data structures and arrays • Naming of dynamically created memory objects: • Single name “heap” • Per memory allocation site • Per context LCPC 2006
Formal Representation • Location sets or locsets: individual named memory locations where: • Points-to relations (R): tuples (p,v) where p: pointer v: location set • P and V: set of pointers and location sets where R P × V :points-to relation • Every tuple (p, v) R means:pointer p may point to location set v p → v • Points-to graph: G = (N, E) of N = P Vnodes andE = Redges LCPC 2006
Formal Representation • Analysis: compute points-to graph to: • Basic dataflow equations that make pointer manipulation operations: • p1 = &p2; (Address-of assignment) • p1 = p2; (Copy assignment) • p1 = *p2; (Load assignment) • *p1 = p2; (Store assignment) • Resulting in: points-to graph to all points-to relationships: • Definitely points-to • Possibly points-to LCPC 2006
Formal Representation Where: • Definitely points-to: R = {(p, v)} only p = &v • Possibly points-to: R = {(p, v),(p, z)} either p = &v or p = &z LCPC 2006
Causes of Uncertainty in Pointer Analysis • Control flow • Pointer arithmetic • Unavailable procedure code • Recursive data structures • Aggregate data structures • Dynamically allocated objects LCPC 2006
Outline • Motivation • Pointer Analysis • Evaluation Methodology • Experimental Setup and Results • Related Work • Conclusions LCPC 2006
Static Source Code Analysis • An extension of Rugina and Rinard’s Context- and flow-sensitive pointer analysis algorithm with following new features: • Number of accesses with pointer de-reference • Number of used and modified locsets that occurs just before of: • Indirect use of a variable : ... = *p; • Indirect modification of a variable: *p = ...; • Multi-level indirect use of variable: ... = * * p; • Multi-level indirect modification of variable: * * p = ...; • Procedure call: foo(..., *p, ...); • Loops : one instance of the cases above per pointer de-reference • Procedures : one instance of each pointer de-reference percalling context LCPC 2006
Run-time Statistics Collection • Our tool inserts additional profiling code that: • Records all different run-time memory addresses • Counts the number of accesses to each different address • Each run-time access has aunique identifier (source code number) thatmatches the run-time / static access • Problem: • Possible mismatches between static and dynamic: • Multiple static accesses may map to the same source code line with thesame run-time counter: • The pointer analysis algorithm separates static accesses according to their context • Not all static accesses may appear at run time: • Portion of the code not executed due to input data LCPC 2006
Outline • Motivation • Pointer Analysis • Evaluation Methodology • Experimental Setup and Results • Related Work • Conclusions LCPC 2006
Experimental Setup • Applications: • SPEC2000 integer • Except gcc, gap, vortex and eon • MediaBench • SPEC2000 fp tried but found to be not interesting as a pointer analysis problem • Standard input set used with run-time experiments LCPC 2006
Applications Characteristics LCPC 2006
Applications Characteristics LCPC 2006
Static Analysis Tool • Extension of SPAN package that: • Records all instances of pointer de-references + number of possible targets + source code line number • Uses and modifications via pointer de-references counted separately • Static de-references to potentially uninitialized pointers use a special location set (unk) and are counted separately • Static de-references to dynamically allocated memory use a special location set (heap.X, where X is context id) and are counted separately LCPC 2006
Static Analysis Results LCPC 2006
Static Analysis Results LCPC 2006
Profiling Environment • Monitor the actual run-time behaviour of static pointer de-references withmultiple possible targets • SPAN extension include profiling code where: • static de-reference has multiple targets and thenrecord the actual address accessed + counter per address • Instrumented code isconverted (SUIF format (.spd) to C code) • Compiled (Intel x86 platform, gcc 3.4.4, -O2 optimization level) LCPC 2006
Run-time Uncertainty 59 + 1 + 24 = 84 59 + 1 + 1 + 23 = 84 LCPC 2006
Causes of Uncertainty LCPC 2006
Outline • Motivation • Pointer Analysis • Evaluation Methodology • Experimental Setup and Results • Related Work • Conclusions LCPC 2006
Related Work • Algorithms: • The basic SUIF1 package used in our study (SPAN) was introduced by R. Rugina and M. Rinard (PLDI ‘1999); • E. M. Nystrom et al proposed a fast and efficient summary-based pointer analysis algorithm (SAS ‘04); • M. Hind discussed main pointer analysis research and talked about unsolved questions (PASTE ‘01) - SURVEY; • Quantification of run-time behavior: • Few works investigated the impact of pointer analysis on overall compiler optimization like B. Cheng and W. M. Hwu, M. Das et al, R. Ghiya et al (SIGPLAN ‘00 - PLDI , SAS ‘04, SIGPLAN ‘01– PLDI); • A attempted to quantify the run-time behavior of points-to sets was done by M. Mock et al (PASTE ‘01); • D. Liang et al is similar to previous work but using Java programs (ISSTA ‘02); LCPC 2006
Related Work • Speculative probabilistic analysis: • A quantitative computation of static points-to results against run-time behavior in a probabilistic framework was proposed by Y. S. Hwang et al (LCPC ‘01) • Support for speculative analysis of points-to was proposed by J. Lin, T. Chen et al (PLDI ‘03) • G. Ramalingam proposed to extend static analysis with probabilistic information reflecting the actual run-time behavior (SIGPLAN ‘01– PLDI) LCPC 2006
Outline • Motivation • Pointer Analysis • Evaluation Methodology • Experimental Setup and Results • Related Work • Conclusions LCPC 2006
Conclusions • For most of the benchmarks static pointer analysis is very accurate • For some benchmarks up to 25% of the de-references cannot be statically fully disambiguated • 27% of these de-referencesaccess a single memory location at run time, but many do access several different memory locations • Results suggest further compiler optimizations exploiting cases where the uncertainty does not appear at run time • We need to improve the handling of pointer arithmetic • New probabilistic approaches thatcapture actual control flow behavior LCPC 2006
Quantifying Uncertainty in Points-To Relations University of Edinburgh http://www.homepages.inf.ed.ac.uk/mc/Projects/VESPA Constantino Ribeiro and Marcelo Cintra