Interprocedural Shape Analysis for Recursive Programs

Interprocedural Shape Analysis for Recursive Programs Noam Rinetzky Mooly Sagiv

Shape Analysis • Static program analysis • Determines information about dynamically allocated storage • A pointer variable is not NULL • Two data structures are disjoint • The algorithm is Conservative

Applications of Shape Analysis • Cleanness • Dor, Rodeh, Sagiv [SAS2000] • Parallelization • Assmann, Weinhardt [PMMPC93] • Hendren, Nicolau [TPDS90] • Larus, Hilfinger [PLDI88]

Current State • Good Intraprocedural analyses • Sagiv, Reps, Wilhelm [TOPLAS 1998] • Analyze body of list manipulation procedures: • reverse , insert, delete • Expensive, imprecise interprocedural analyses of recursive procedures

Main Results • Interprocedural shape analysis algorithm for programs manipulating linked lists • Handles recursive procedures • Prototype implementation • Successfully analyzed several list manipulating procedures • insert, delete, reverse, reverse_append • Properties verified • An a-cyclic list remains a-cyclic • No memory leaks • No NULL dereference

typedef struct List { int data ; struct List* n ; } *L ; L create(int s) { L t=NULL; if (s <= 0) return NULL; t = (L) malloc(sizeof(*L)); t  data = s ; l2: t  n = create(s-1); return t; } Running Example void main() { L r = NULL; int k; … l1: r = create(k); }

Selected Memory States void main() { L r = NULL; int k; … l1: r = create(k); } exit k=3 r = NULL

1 3 2 NULL NULL NULL Selected Memory States L create(int s) { L t=NULL; if (s <= 0) return NULL; t = (L) malloc(sizeof(*L)); td = s ; l2: t n = create(s-1); return t; } exit k=3 r = NULL l1 s=3 t l2 s=2 t l2 s=1 t l2 s=0 t = NULL

1 3 2 NULL NULL NULL Selected Memory States L create(int s) { L t=NULL; if (s <= 0) return NULL; t = (L) malloc(sizeof(*L)); td = s ; l2: t n = create(s-1); return t; } exit k=3 r = NULL l1 s=3 t l2 s=2 t l2 s=1 t

3 2 NULL Selected Memory States 1 L create(int s) { L t=NULL; if (s <= 0) return NULL; t = (L) malloc(sizeof(*L)); td = s ; l2: t n = create(s-1); return t; } NULL exit k=3 r = NULL l1 s=3 t l2 s=2 t

Selected Memory States 1 3 2 L create(int s) { L t=NULL; if (s <= 0) return NULL; t = (L) malloc(sizeof(*L)); td = s ; l2: t n = create(s-1); return t; } NULL exit k=3 r = NULL l1 s=3 t

Selected Memory States 1 3 2 NULL void main() { L r = NULL; int k; … l1: r = create(k); } exit k=3 r

1 3 2 NULL NULL NULL exit k=3 l1 s=3 l2 s=2 l2 s=1 l2 s=0 Where is the Challenge ? • Dynamic allocation • Unbounded number of objects • Recursion • Unbounded number of activation records • Properties of: • Invisible instances of local variables • Dynamically allocated objects r = NULL t t t t = NULL

Explicit manipulation of the stack • Represent the activation record stack as a linked list: • Control Information • Invisible instances of local variables Our Approach Reduce the interprocedural problem shape analysis problem to an intraprocedural problem Program with procedures Program without procedures

Our Algorithm • Abstract Interpretation • Concrete Semantics: • Concrete representation of memory states • Effect of program statements • Abstract Semantics: • Abstract representation of memory states • Transfer functions • Finds abstract representation of memory states at every program point

csexit t pr t csl1 t pr csl2 pr csl2 pr topcsl2 Concrete Memory Descriptors 1 3 2 NULL NULL NULL exit k=3 r = NULL l1 s=3 t l2 s=2 t l2 s1 t l2 s=0 t = NULL

Concrete Memory Descriptors • Properties of memory elements: • “type”: stack, heap • “visibility”: top • “call-site”: exit, csl1 , csl2 csexit t pr t csl1 t pr • Relationships between memory elements: • value of local variables: t, r • n-successor: n • invoked by: pr csl2 pr csl2 pr topcsl2

Bounding the Representation • Concrete Memory Descriptors represent memory states • Every object is represented uniquely • Abstract Memory Descriptors • Conservatively represent Concrete Memory Descriptors • A bounded representation

Don’t Know top=1/2 t t 3-Valued Properties True False top

csexit t pr t csl1 pr csl2 pr pr csl2 , top Abstraction csexit t pr t csl1 t pr csl2 pr csl2 pr csl2 , top

Bounding the Representation • Summarize nodes according to their unary properties • Join values of relationships • Convert a Concrete Memory Descriptor of arbitrary size into an Abstract Memory Descriptor of bounded size • Does the Abstract Memory Descriptor contain enough information?

pr pr pr Problem exit exit t t pr pr t csl1 csl1 t pr t csl2 csl2 pr csl2 pr csl2 , top csl2 , top

Observing Properties of Invisible Variables • Explicitly track universal properties of invisible-variables • Different invisible instances of t cannot point to the same heap cell • Instrumentation properties • Track derived properties of memory elements

Some Instrumentation Properties • Pointed-to by an invisible instance of t • Pointed by more than one invisible instance of t • t is not NULL

pr pr pr Memory Descriptors with Instrumentation t exit exit t t pr pr csl1 t t csl1 pr csl2 pr csl2 csl2 pr csl2 , top csl2 , top

pr pr pr Problem - solved exit exit t t pr pr t csl1 t csl1 pr t csl2 pr csl2 csl2 pr csl2 , top csl2 , top csl2 , top

Why Does It Work • Shape analysis handles linked list quite precisely (Sagiv, Reps, Wilhelm [TOPLAS98]) • Utilize the (intraprocedural) 3-valued logic framework of Sagiv, Reps and Wilhelm [POPL99] to analyze the resulting intraprocedural problem

Prototype Implementation • Implemented in TVLA [Lev-Ami, Sagiv SAS 2000] • Analyzed some recursive list manipulating programs • Verified cleanness properties: • No memory leaks • No NULL dereferences

Procedure create delAll insert delete search append reverse reverse_append reverse_append _r Running example Prototype Implementation Number of (3VL) Structures 219 139 344 423 303 326 414 797 2285 208 Time (sec) 7.31 12.74 34.61 38.29 8.07 40.64 47.56 95.35 1204.13 16.50

Conclusion • Need to know more than potential values of invisible variables • Tracking properties of invisible variables helps to overcome the (necessary) imprecision summarization of their values • Instrumentation • Generic • Sharing by different instances of a local variable • List specific

Conclusion • Storing the call-site enable to improve information propagation to return-sites • Shows how theintraprocedural framework of Sagiv, Reps and Wilhelm can be used for interprocedural analyses • Analysis of a complex data structure

Limitations • Small programs • No mutual recursion (Implementation) • Predefined instrumentation library • Easy to use, no need for user intervention • Might not be good for all programs

Further Work • Scaling the algorithm • Distinguishing between “relevant context” and “irrelevant” context • Analysis of programs manipulating Abstract Data Types

The End Interprocedural shape analysis for recursive programsNoam rinetzky and Mooly Sagiv Compiler Construction 2001 www.cs.tau.ac.il/~maon

Interprocedural Shape Analysis for Recursive Programs