230 likes | 324 Views
SPEED: Statically Estimating Symbolic Computational Complexity of Programs. Krishna Mehra MSR Bangalore. Trishul Chilimbi MSR Redmond. Sumit Gulwani MSR Redmond. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A. Problem Definition.
E N D
SPEED: Statically Estimating Symbolic Computational Complexity of Programs Krishna Mehra MSR Bangalore TrishulChilimbi MSR Redmond Sumit Gulwani MSR Redmond TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA
Problem Definition Compute symbolic complexity bounds of procedures in terms of inputs (assuming unit cost for statements). • Can use different cost metrics. • Only count memory instructions • Only count memory allocation instructions and weight them with memory allocated (space bounds) • Only count network instructions weighted appropriately (network traffic bounds) • Can also compute bounds for interesting code fragments. • code executed between lock acquire/release.
Applications • Provide immediate feedback during code development • Code Editing. • Use of unfamiliar APIs. • Performance Analysis • Identify corner cases. • Embedded Systems • Establish space bounds. • Soft Real-time Systems • Establish time bounds. • Feedback into a runtime power-management scheme.
Outline • Challenges in Bounds Analysis • Idea #1: Proof Structure (control flow) • Idea #2: Quantitative Functions (data-structures)
Challenges in Computing Bounds • Presence of control-flow • Bounds for even simple programs are non-linear, disjunctive. • Sometimes even proving termination is hard. • Presence of data-structures • Expressing bounds requires numerical fns over data-structures. • Computing these bounds requires sophisticated shape analysis.
Counter Instrumentation Based Solution c := 0; while (cond) do S; c := c+1; while (cond) do S The main challenge is in computing loop bounds. A simple counter instrumentation scheme: Loop bounds can be obtained by computing bounds on c using invariant generation tools [CAV ‘08] However the required invariants are usually disjunctive, non-linear, and refer to heap -- and hence hard to compute. Our solution: Refinement of above scheme that allows bounds generation using simple linear invariant generation tools.
Example: Non-linear bounds int size; // Assume(0 · e1.len, e2.len · size); Equals (StringBuffer s1, StringBuffer s2) { c1 := c2 := c3 := 0; e1:=s1.GetHead(); e2:=s2.GetHead(); i1:=e1.len-1; i2:=e2.len-1; while (true) { while (i1¸0 Æ i2¸0) { if (e1.arr[i1] e2.arr[i2]) return 0; i1--; i2--; c3++; } while (i1<0 Æ e1null) { e1 := s1.GetNext(e1); i1 := i1+e1.len;c1++; c3 := 0; } while (i2<0 Æ e2null) { e2 := s2.GetNext(e2); i2 := i2+e2.len; c2++; c3 := 0; } if (i1<0) return (i2<0); if (i2<0) return 0; c3++; }; return 1; } Total iterations of 2nd & 3rd inner loops: Len(s1) & Len(s2). For each iteration of 2nd & 3rd inner loops, combined iterations of 1st inner loop & outer loop: size Therefore total complexity is (1+size)*(1+Len(s1)+Len(s2))
Example: Disjunctive Bounds Example2(int n, x0, z0) { c1 := 0; c2 := 0; x := x0; z := z0; while (x<n) if (z>x) x++; c1++; else z++; c2++; } • Termination proof based on disjunctively well-founded relation. • We can even compute bounds using following proof structure: • Number of times if-branch is executed (if at all): n-x0 • Number of times else-branch is executed (if at all): n-z0 • Therefore, total iterations: Max(0,n-x0) + Max(0,n-z0)
Outline • Challenges in Bounds Analysis • Idea #1: Proof Structure (control flow) • Idea #2: Quantitative Functions (data-structures)
Proof Structure c2 c1 Proof Structure for StringBuffer Example: • M = {q c3, q1 c3, q2 c1, q3 c2,} • q: backedge of outer loop, qi: backedge of ith inner loop G = c3 Proof Structure specifies where to increment and initialize multiple counter variables. It is a tuple (M,G) such that • M maps each backedge q to some counter variable c. • “c++” at q. • G is some DAG over counter variables. • “c:=0” at entry and where any predecessor in G is incremented. • Invariant tool can bound counters instrumented as above.
Computing bound from a proof structure Given a proof structure (M,G), bound U is computed as: U = Sum { TotalBound(c) | c } TotalBound(c) = Max{ 0, B(q) | M(q) = c } £ (1 + Sum{TotalBound(c’) | (c’,c) 2 G}) Where B(q) is the bound computed on M(q) at q. Bound for StringBuffer Example: U = Len(s1) + Len(s2) + (1+size)£(Len(s1)+Len(s2))
Automatically Computing Proof Structure • Total number of potential proof structures (M,G) are exponential in number of back-edges. • Hence a naïve search is expensive. • Key Idea: Increasing counters and dependencies increases ability of an invariant generation tool to discover bounds. • But cannot simply make all counters depend on each other. • Need to find right set of dependencies that create a DAG. • There is a quadratic (in number of back-edges) algorithm to compute a (counter-optimal) proof structure. [POPL ’09] • A counter-optimal proof structure uses minimal counters and miminal dependencies between counters. • Generally, this leads to more precise bounds.
Outline • Challenges in Bounds Analysis • Idea #1: Proof Structure (control flow) • Idea #2: Quantitative Functions (data-structures)
Quantitative Functions • Defined over tuple of abstract data-structures • Similar to ghost fields. Len(L) : Length of list L. Pos(e,L) : Position of list-element e in List L. • Semantics is defined by describing effect of data-structure methods on quantitative functions. • Sequence of (conditional) assignments and assumes. • Can also refer to unscoped variables (universally quantified).
Principles behind defining Quantitative Functions • Precision • Defining more quantitative fns. increases ability of linear invariant generation tool to find bounds. • In practice, a few quantitative fns are usually sufficient. • Soundness • Method annotations are always sound from tool’s perspective. • User’s responsibility to ensure that intended semantics matches with the method annotations. • Verification is possible if intended semantics can be described in an appropriate logic • Gulwani, Sagiv, Lev-Ami: “A Combination Framework for Tracking Partition Sizes”, POPL 2009.
Computing Invariants over Quantitative Functions • Instrument a data-structure method call with its effect allowing quantitative fns. to be treated as uninterpreted. • Instantiate unscoped variables with all appropriate terms. • Use a linear invariant generation tool with support for uninterpreted functions. • Abstract Interpretation based Technique. Combine Polyhedron abstract domain [Cousot, POPL ‘79] with uninterpreted fns domain [Gulwani, Necula, SAS’ 04] using domain-combinators[Gulwani, Tiwari, PLDI ‘06] • Constraint-based Invariant Generation Technique. [Beyer et.al., VMCAI ‘07]
Example: Breadth First Traversal BFT(List L): ToDo.Init(); L.MoveTo(L.Head(),ToDo); c:=0; while (! ToDo.IsEmpty()) e := ToDo.Head(); ToDo.Delete(e); foreach successor s in e.Successors() if (L.contains(s)) L.MoveTo(s,ToDo); c++; Inductive Invariant at back-edge of while-loop c· Old(Len(L)) - Len(L) – Len(ToDo) Æ Len(L) ¸ 0 Æ Len(ToDo) ¸ 0 This implies a bound of Old(Len(L)) for while loop.
Quantitative Functions for Bit-vectors Ones(b): Number of 1 bits in b One(b): Position of least significant 1 bit in b Bits(b): Number of bits in b
Example Iterate(BitVector a): b := a; c := 0; while (BitScanForward(&id1,b)) b := b | ((1 << id1)-1); // set all bits before id1 if (BitScanForward(&id2,»b)) break; b := b & (»((1 << id2)-1)); // reset bits before id2 c++; Each loop iteration masks chunk of consecutive 1s to 0. • Our tool computes invariant: c·Ones(a)-Ones(b) Æ 2c·One(b)-One(a) Æ One(b)·Bits(a) • This implies bound of Min {Ones(a), Bits(a)/2 }
Quantitative Functions for List of Lists TotalNodes(L) = Sum { Len(e’) | L.BelongsTo(e’) } MaxNodes(L) = Max { Len(e’) | L.BelongsTo(e’) }
Quantitative Functions for Trees Nodes(T): Total number of nodes in tree T Height(T): Height of tree T
Conclusion • Applications of Symbolic Bounds Analysis • Interactive code development, Embedded/Real-time systems • Challenges in Bounds Analysis • Control flow leads to non-linear and disjunctive bounds. • Data-structures require numerical shape analysis. • Idea #1: Proof Structure (control flow) • Addresses issue of non-linear and disjunctive bounds. • Reduces Bounds Analysis to linear numerical shape analysis. • Idea #2: Quantitative Functions (data-structures) • Further reduces Bounds Analysis to linear invariant generation over uninterpreted functions.