Synthesizing parametric specifications of dynamic memory utilization in object-oriented programs

Synthesizing parametric specifications of dynamic memory utilization in object-oriented programs Víctor Braberman: DC, FCEN, UBA, Argentina Diego Garbervetsky: DC, FCEN, UBA, Argentina Sergio Yovine: Verimag. France Dependable Software Research Group DEPENDEX Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

k times new A() & (1+2+…+k) times new B() memAlloc(m1)=size(A) * k + size(B) * ( ½k2+½k) Motivation How much dynamic memory is allocated when method m1 is invoked? void m1(int k) { for(i=1;i<=k;i++) { a = new A(); m2(i); } } void m2(int n) { for(j=1;j<=n;j++) { b = new B(); } } Not a trivial task! Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Context • Problem undecidable in general • Impossible to find an exact expression of dynamic memory allocation even knowing program inputs • Several techniques for functional languages • Usually linear upper bounds • Less explored for Object Oriented programs Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Our work A general technique to find non-linear parametric upper- bounds of dynamic memory utilization Given a method m(p1,..,pn) • memAlloc(m): symbolic expression (a polynomial) in terms of p1,…,pn over-approximating the amount of dynamic memory allocated by any run starting at m Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

 {0≤ i < n, 0≤j<i}: a set of constraints describing a iteration space j i Key idea: Counting visits to statements that allocates memory • for(i=0;i<n;i++) • for(j=0;j<i;j++) • new C() • Dynamic Memory allocations  number of visits to new statements •  number of possible variable assignments at statement’s control location • number of integer solutions of a predicate constraining variable assignments at its control location (i.e. an invariant) For linear invariants, # of integer solutions = # of integer points = Ehrhart polynomial (size(C) * ( ½k2+½k)) Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Our approach • Identify every allocation site (new statement) reachable from the method under analysis (MUA) • Generate invariants describing possible variables assignments at each allocation site (the “iteration space”) • Count the number solutions for the invariant in terms of MUA parameters (# of visits to the allocation site) • Adapt those expressions to take into account the size of object allocated (their types) • Sum up the resulting expression for each allocation site Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Running Example void m0(int mc) { 1: m1(mc); 2: B[] m2Arr=m2(2 * mc); } void m1(int k) { 3: for (int i = 1; i <= k; i++) { 4: A a = new A(); 5: B[] dummyArr= m2(i); } } B[] m2(int n) { 6: B[] arrB = new B[n]; 7: for (int j = 1; j <= n; j++) { 8: B b = new B(); } 9: return arrB; } Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Step 1:Identifying allocation sites • Distinguish program locations not only by a “method-local” control location but also by a call chain • Creation Site (cs=π.l) = a path π from the MUA to a new statement at l. • Denotes a statement and a call stack. • Example: m0.2.m2.6, cs for statement new B[] with stack (m0.2). Creation sites reachable from m0:CSm0 = {m0.1.m1.4, m0.1.m1.5.m2.6, m0.1.m1.5.m2.8, m0.2.m2.6, m0.2.m2.8} m2 is called at least twice  2 static traces for 6:newB[] and 8:new B Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Step 2:Finding invariants for creation sites • We need invariants involving variables in a path through several methods (appearing in the creation site) void m0(int mc) { 1: m1(mc); 2: B[] m2Arr=m2(2 * mc); } void m1(int k) { 3: for(int i = 1; i <= k; i++){ 4: A a = new A(); 5: B[] dummyArr= m2(i); } } B[] m2(int n) { 6: B[] arrB = new B[n]; 7: for(int j = 1; j <= n; j++){ 8: B b = new B(); } 9: return arrB; } Im0(m0.1.m1.4){k=mc  1≤i≤k} Creation Site invariants can be generated using local invariants and binding the calls Im0 (m0.1.m1.5.m2.6){k=mc  1≤i≤k  n=i} Im0(m0.1.m1.5.m2.8){k=mc  1≤i≤k  n=i  1≤j≤n} Im0(m0.2.m2.6){n=2*mc} Im0(m0.2.m2.8){n=2*mc  1≤j≤n} Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Step 3: Counting the number of solutions (in terms of MUA parameters) • Example: • # of visits (in terms of m0 parameters) to m2.8 for the stack configuration [m0.1.m1.5]? • Recall: Im0(m0.1.m1.5.m2.8){k=mc  1≤i≤k  n=i 1≤j≤n} • Then # of visits in terms of mc (method m0 parameter) • = #{(k,i,j,n)| (k=mc  1≤i≤k  n=i  1≤j≤n) } = • = ½ mc2 + ½ mc Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Step 4:Transforming number of visits into memory consumption • We know how to approximate number of visits of a creation site, but not dynamic memory allocations • Example: • How much memory (in terms of m0 parameters) is allocalated by to m2.8 for the stack configuration [m0.1.m1.5]? • Recall: # of visits in terms of mc (method m0 parameter) = ½ mc2 + ½ mc • Then memory allocated is size(B)*½ mc2 + ½ mc • S(m,cs): computes an upper bound of the amount of memory allocated by one creation site, in terms of the parameters of m • Transforms #of visits into estimations of memory consumptions • Special treatment for arrays allocations (new T[e1]..[en]) • Treated as n nested loops: • for(t1=0;t1<e1;t1++)…for(tn=0;tn<en;tn++) new RefT Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Step 5: Summing up expressions • To predict the amount of memory allocated by a method m. • memAlloc(m) = computeAlloc(m,CSm) • For every creation site: Get an invariant, compute the S function and sum them up where • memAlloc(m0) =S(m0,m0.1.m1.4)+S(m0,m0.1.m1.5.m2.6) +S(m0,m0.1.m1.5.m2.8)+S(m0,m0.2.m2.6)+S(m0,m0.2.m2.8 ) = = size(B) * (1/2 * mc2 + 5/2 * mc) + size(B[]) * (1/2 * mc2 + 5/2 * mc) + size(A) * mc Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Experiments • We tested our prototype with some JOlden and JavaGrande benchmarks. Obtained by hand • In general, when the amount of memory allocated is polynomial , we obtained accurate upper bounds • The main issue is finding good invariants… Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Scoped-memory Management • Leveraging escape analysis, we can compute upper bounds of memory escaping and captured by a method (assuming a region per method) • memEscapes(m) = computeAlloc(m,escapes(m)) • memCaptured(m)=computeAlloc(m,capture(m)) • Useful for RTSJ • Predicting regions sizes • Predicting how much allocated memory by the MUA will remain uncollected after its execution Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Prototype Tool Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Conclusions • A technique that computes non-linear parametric upper bounds of dynamic memory allocation • An application to scoped memory management • Use for estimating region size in RTSJ • Useful for embedded systems • Benchmarks results are promising… • But many challenges remain… Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Current and future Work • Find a symbolic upper-bound of memory required to run a method (assuming scoped-memory management) • We need to solve an optimization problem (symbolically) • Improving precision of upper-bounds under weaker invariants • [if (cond) then B1 else B2] statements, not capturing cond • The same for polymorphism • Dealing with recursion • Automated code generation for RTSJ • Using memCaptured estimator to determine region’s size Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Extra Material • How we compute the path invariants • Memory required to run a method • Improving method precision • Counting (more formally) • Definition of function S() Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

On computing Invariants • We need linear invariants involving variables in a path through several methods • Strategy: we compute or annotate local invariants and bind them • Our technique could deal with some patterns of iteration beyond integer-counter based ones. • for iterations over collections we introduce a virtual counter bounded by the collection size (i.e. {0i c.size()}) • We (try) to obtain invariants that only predicates about inductive set of variables (roughly speaking, a subset of variables which is enough to count the number of visits of a given statement) • Currently we approximate inductive variables sets by combining a field sensitive live variables analysis and manual adjustments Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Step 2:Finding invariants for creation sites • We need linear invariants involving variables in a path through several methods • We compute or annotate local invariants and bind them ? Example for: cs m0.1.m1.5.m2.8 I(m0.1){} I(m1.5){1≤i≤k } I(m2.8){1j≤n } I(m0.1.m1) {k=mc } I(m1.5.m2){n=i }(bindings) Im0(m0.1.m1.5.m2.8){k=mc  1≤i≤k  n=i  1j≤n} ? ? ? ? Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Computing invariants using Daikon Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Memory required to run a method • Knowing the amount memory captured by a method is not enough • We must consider the regions of the method it calls • They are not in terms of MUA parameters • A method could be called several times with different arguments Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Two maximization problems • In any run only one stack (path) configuration will be active (single-threading) • required(m0)(mc) = max (rsize(m0.1.m1.5,mc)+ rsize(m0.1.m1.5.m2,mc), rsize(m0.2.m2,mc)) • In one path a region can be created several times and have different sizes • memCapture(m2) depends may vary depending on i in the path m0.1.m1.5.m2 • For every path, we need an expression in terms of MUA parameters that maximizes the size of every region in the path Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Maximizing a path • rsize(.m,pmr)=Maximize memCaptured(m) subject to Imr()[P/pmr] • This is, find an expression in terms of method mr parameters that represents the maximum region for method m • knowing that m will be called with stack  • and the variables in call stack are constrained by the invariant Imr() Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Improving technique precision The statements 3: and 4: will have the same invariant… And the technique will sum their upper-bounds ignoring the impossibility of visit both statements 3: and 4:in the same iteration! • computeAlloc relies on having good invariants capturing “control-flow” decisions • Consider this example: 1: for(int i=1;i<=n;i++) 2: if(t(i)) 3: a[i] = new Integer[2*i]; {1≤i≤n  t(i)} 4: else 5: a[i] = new Integer[10]; {1≤i≤n  t(i)} • What happens if t(i) cannot be capture by the invariants? Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Improving precision (cont…) • How do we cope with this problem? • Find a condition that maximizes the amount of memory allocated by the statements knowing that they cannot by executed together • In the example we can add a new restriction over i • 3:{1≤i≤n  i>5} • 5:{1≤i≤n  i≤5} Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

j i Counting the number of solutions (more formally) • Given an invariant and a set of selected variables (parameters) we can get an expression in terms of their parameters • It represents the number solutions to the invariant, fixing the values of that parameters • Example: Im0(m0.1.m1.5.m2.8){k=mc  1≤i≤k  n=i  1≤j≤n} • C(Im0(m0.1.m1.5.m2.8),{k,i,j,n},{mc})(mc) = • = #{(k,i,j,n)| (k=mc  1≤i≤k  n=i  1≤j≤n) } = • = ½ mc2 + ½ mc Counting the number of solutions for an invariant for a creation site cs=π.l over approximates the number of visits of the new statement when program stack is π Theoretical Framework: • Given a set of constraints  such that var()=PW, the number of solutions for  fixing the values of P: C(,W, P)(p) = #{w| [W/w,P/p] }is a function in terms of P. • For polytypes, # of integer solutions = # of integer points = Ehrhart polynomial Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Function S (more formally) • C(Ics,W,P) approximates number of visits of a creation site • S(I,P,cs): computes an upper bound of the amount of memory allocated by a creation site, in terms of P using C(Ics,W,P) • Example for creation site m0.1.m1.5.m2.8(new B): • Im0(m0.1.m1.5.m2.8){k=mc  1≤i≤k n=i  1≤j≤n}, • C(Im0(m0.1.m1.5..m2.8),{n,i,k,j},{mc})= ½ mc2 + ½ mc • S(Im0(m0.1.m1.5.m2.8),{mc}, m0.2.m2.8) = = size(B)*(C(Im0(m0.1.m1.5.m2.8),{n,i,k,j},{mc})=size(B)*½ mc2 + ½ mc Adaptations performed by S(I,P,cs) • new T(): Size(T)*C(I,W,P) • new T[e1]..[en]: Size(T[]) * C(I{0≤t1<e1} … {0≤tn<en} ,W,P) • Simulating n nested loops: for(t1=0;t1<e1;t1++)…for(tn=0;tn<en;tn++)new T[] Synthesis of parametric specifications of dynamic memory utilization. BGY. FTFJP'05

Synthesizing parametric specifications of dynamic memory utilization in object-oriented programs