280 likes | 369 Views
A Practical and Precise Inference and Specializer for Array Bound Checks Elimination. Dana N. Xu Univ of Cambridge. Corneliu Popeea Natl Univ of Singapore. Wei-Ngan Chin Natl Univ of Singapore. Array Bound Check Elimination. Problem:
E N D
A Practical and Precise Inference and Specializer for Array Bound Checks Elimination Dana N. Xu Univ of Cambridge Corneliu Popeea Natl Univ of Singapore Wei-Ngan Chin Natl Univ of Singapore PEPM 2008 - 8 January
Array Bound Check Elimination • Problem: • without array bound checks (e.g. C), programs may be unsafe. • with array bound checks (e.g. Java), program execution is slowed down. • Solution: eliminate redundant checks. Inference Specialization input program optimized program method summaries
Checks : i¸0 i<len(a) L1 Checks : m¸0 m<len(a) L2 Inference Goal: derive preconditions that make checks redundant. float foo (float a[], int j, int n) { float v=0; int i = j+1; if (0<i<=n) then v=a[i] else (); int m = abs(random()); v + a[m]; } SAFE PRECONDITION Symb. Program State: i=j+1 Æ 0<i<=n SAFE UNSAFE Symb. Program State: i=j+1 Æ m>=0 Our contributions: modular inference of preconditions. handling indirection arrays.
L1 Specialization Goal: eliminate runtime checks guided by inference results. • If we assume all callers satisfy (j+1< len(a)) : float foo (float a[], int j, int n) { float v = 0; int i = j+1; if (0<i<=n) then v=a[i] else (); int m = abs(random()); v + (if (m<len(a)) then a[m] else error); } Our contribution: integrate modular inference with specializer.
Overview • Introduction • Our approach • Modular inference: postcondition + preconditions. • Flexi-variant specialization. • Experimental results. • Conclusion.
Setting • First order imperative language: • Invariants expressed as linear formulae: meth ::= t mn ( ([ref] t v)* ) { e } - method t ::= int | float | t[int, .. , int] - type e ::= k | v | if v then e1 else e2 - expression | v=e | t v=e1;e2 | mn(v*) Q ::= { q(v*) = Á } - recursive formula Á::= Á1ÆÁ2| Á1ÇÁ2| q(v*) | s - formula s ::= a1v1 + .. + anvn· a - linear inequality
L1 L0 L2 Forward Derivation • Compute sps (symb. program state) at each point. • To support modularity, symbolic transitions relate initial values (j,n) and latest values (j’,n’) : float foo (float a[], int j, int n) { float v=0; int i = j+1; if (0<i<=n) then v=a[i] else (); int m = abs(random()); v + a[m]; } sps(L1) = sps(L0) Æ i’=j'+1 Æ 0<i’·n sps(L2) = sps(L0) Æ i’=j'+1 Æ m’¸0 sps(L0) = len(a)>0Æj’=jÆn’=n
Forward Derivation for Recursion • Each method is first translated to a recursive constraint. • Compute an over-approximation of the least fixed point of this recursive constraint: • precise disjunctive polyhedron abstract domain. • with hulling and widening operators. • Details and examples in the paper.
Indirection Arrays • Hold indexes for accessing another array. • Used intensively for sparse matrix operations. • Need to capture universal properties about elements inside array: 8 i 2 indexes(a) ¢ 0 · a[i] · 10 0 · a_elem · 10 represented as:
Indirection Arrays • Given method: • Compute postcondition: void initArr(int a[], int i, int j, int n) { if (i>j) then () else { a[i]=n; initArr(a,i+1,j,n+1) } (i>j Æ a_elem'=a_elem) Ç (0·i·j<len(a) Æ (a_elem'=a_elem Ç n·a_elem'·n+j-i))
Inference of Preconditions pre = 8L¢(sps ) chk) • Classify checks with • pre is valid: safe check. • pre is unsatisfiable: unsafe check. • .. otherwise propagate pre as a check for the caller.
L1 L2 sps(L1) = len(a)>0 Æ i'=j'+1 Æ 0<i'·n' Æ j'=j Æ n'=n pre(L1.high) = 8 {i',j',n'} ¢ (sps(L1) ) i'<len(a)) = (j<len(a)-1) Ç (n·j Æ j¸len(a)-1) sps(L1) = len(a)>0 Æ i'=j'+1 Æ 0<i'·n' Æ j'=j Æ n'=n pre(L1.low) = 8 {i',j',n'} ¢ (sps(L1) ) i'¸ 0) = true sps(L2) = len(a)>0 Æ i'=j'+1 Æ m'¸ 0 Æ j'=j Æ n'=n pre(L2.high) = 8 {i',j',n',m'} ¢ (sps(L2) ) m'<len(a)) = false Example: Preconditions • Derive weakest precondition for each check: float foo (float a[], int j, int n) { float v = 0; int i = j+1; if (0<i<=n) then v=a[i] else (); int m = abs(random()); v + a[m]; }
too large no loss in precision less precise, but more efficient Efficient Preconditions • Problem: negation of sps results in large preconditions. • naïve pre-derivation: (len(a)· 0) Ç (j<len(a)-1 Æ 1·len(a)) Ç (n·j Æ 1·len(a)·j+1) • Simplify preconditions via strengthening: • weak pre-derivation drops disjuncts that violate type-invariants: (j<len(a)-1) Ç (n·jÆ len(a)· j+1) • strong pre-derivation drops disjuncts that allow the avoidance of the check: (j<len(a) - 1) • selective pre-derivation between weak and strong.
L1 L2 Inference Result: Method Summary • Postcondition: (j<len(a)-1 Ç j¸len(a)-1 Æ n·j) Æ j’=j Æ n’=n • Preconditions: { L1.high: (j<len(a)-1) } • Unsafe-checks: { L2.high } float foo (float a[], int j, int n) { float v = 0; int i = j+1; if (0<i<=n) then v=a[i] else (); int m = abs(random()); v + a[m]; }
Overview • Introduction • Our approach • Modular inference: postcondition + preconditions. • Flexi-variant specialization. • Experimental results. • Conclusion.
L1 Specialization • If we assume all contexts satisfy (j+1 < len(a)): • If we assume all contexts do not satisfy (j+1 < len(a)): specialize foo with 2 runtime checks. • Otherwise … ? float foo (float a[], int j, int n) { float v = 0; int i = j+1; if (0<i<=n) then v=a[i] else (); int m = abs(random()); v + (if (m<len(a)) then a[m] else error); }
Specialization • Monovariant specializer • One specialized code for each method. • Lower bound of all optimization. • Compact code size. • Polyvariant specializer • Multiple optimized codes per method. • Each call site is replaced by a specialized call. • Highly optimized but may have code blow-up.
Flexivariant Specialization • Allows trade-off between optimization and code size. • Decides how many copies to generate per method, based on frequency and size constraint. • Less optimization - 1 copy: foo (2 runtime checks). • More optimization - 2 copies: foo1 (1 runtime check) + foo2 (2 runtime checks)
Soundness • Inference + Specialization = Well-typed program Theorem: Given a program P and an inference judgment ` P PI. Let Bflex PI PT be the specialization of PI to PT. Then, if PT is well-typed, its execution will never proceed to invalid array-accesses.
Implementation • Prototype written in Haskell language: • uses an efficient Presburger solver [W. Pugh et al]. • disjunctive fixed-point analyzer [Popeea and Chin]. • Test programs: • small programs: binary search, merge sort, quick sort. • numerical benchmarks: Fast Fourier Transform, LU decomposition, Linpack.
Precondition Strengthening • Weak prederivation may generate preconditions that are too large to be manipulated (* signifies a timing over an hour) • Strong prederivation keeps preconditions small (simplifies 81% from weak-pre). • Selective prederivation: both efficient and precise (simplifies 63.4% from weak-pre).
Conclusion • Modular summary-based analysis: • Disjunctive postcondition inference • Derivation of efficient, scalable preconditions. • Integration with a flexi-variant specializer. • Implementation of a prototype system. • Correctness proof.
A Practical and Precise Inference and Specializer for Array Bound Checks Elimination Corneliu Popeea, Dana N. Xu, Wei-Ngan Chin We thank Siau-Cheng Khoo for sound and insightful suggestions. Thanks to anonymous referees for comments.
Related Work • Global analyses: • Techniques: Suzuki and Ishihata [POPL'77], Cousot and Halbwachs [POPL'78] • Tools: Astreé [PLDI'03], C Global Surveyor [PLDI'04] • Modular analyses: • Cousot and Cousot [IFIP'77, CC'02] • Chatterjee, Ryder and Landi [POPL'99] • Moy [VMCAI'08] • Dependent type checking: • Xi and Pfenning [PLDI'98]
Limitations: • Large formulae: currently under-approx. formulae are propagated. Over-approx. formulae are more compact, since sps appears in a positive position. • Future work: • Dual analysis to validate some alarms as true bugs. • Extend the analysis with sound treatment of reference types. • Handle more (existential) properties about array elements.
Two Kinds of Recursive Invariants • For loops: • compute a loop invariant. • For methods with general recursion: • compute a loop invariant. • the method postcondition cannot be determined directly from the loop invariant: a separate fixed-point is computed.
VCgen Verification Condition Generator • Backward VCgen: • given: {P} assert chk {Q} • derives: P = (Q Æ chk) • Our precondition derivation: • given: {pre} …; assert chk {sps} • derives: pre = (sps => chk) • Differences: • sps is a transition relation: pre holds at the beginning of the current method. • sps is computed by a separate forward derivation.