870 likes | 902 Views
Partially Disjunctive Shape Analysis. Roman Manevich Mooly Sagiv Ganesan Ramalingam. advisor: consultant:. Non-blocking stack [Treiber, 1986]. unbounded number of threads. unbounded dynamic memory.
E N D
Partially DisjunctiveShape Analysis Roman ManevichMooly SagivGanesan Ramalingam advisor: consultant:
Non-blocking stack [Treiber, 1986] unbounded number of threads unbounded dynamic memory [1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] } x points to valid memory?can cause memory leak?does list remain acyclic? [9] data_type pop(Stack *S){[10] do {[11] Node *t = S->Top;[12] if (t == NULL)[13] return EMPTY;[14] Node *s = t->n;[15] data_type r = t->d;[16] } while (!CAS(&S->Top,t,s));[17] return r;[18] } • stack linearizable? lock-free:benign data races Automatic proof of linearizability for an unbounded number of threads
Shape graph Concrete heap x x >1 boundedabstraction >1 1 y y >1 Shape abstractions • Canonical heap abstraction[Sagiv et al., TOPLAS 02] • Abstraction for (possibly cyclic) lists[M. et al., VMCAI 05] • Abstracts length of “list segments” • Retains shape of heap
x x x x >1 >1 1 y y y y >1 Abstraction and concretization ...
b a’ • a’ overapproximates a • a’ represents b Abstraction and concretization Abstract domain(partial order ) Concrete domain(partial order ) a
x x x t t y y y >1 >1 1 1 1 Sound abstract transformers t=x->n
st Most precise transformer[Cousot & Cousot POPL 77] Concrete domain Abstract domain st
st Sound abstract transformer Concrete domain Abstract domain st
x x x x t t t y y y y >1 >1 >1 1 >1 1 1 Sound abstract transformers t=x->n
x x x x x x x x x x null null null null null null null null null null Inferring safe invariants [1] Node * create(int size){[2] Node * x = NULL;[3] while (size-- > 0) {[4] Node * t = (Node *) malloc(sizeof(Node));[5] t->n = x;[6] x = t; } [7] return x; } t t t [3] t [3] [4] 1 1 [4] [5] [6] [7] [7] t t t t [3] [3] [4] [4] 1 1 1 >1 >1 [6] [5] [7] [7] t t >1 1 >1 [6] [5]
Disjunctive school • DNF-like invariants • 1(v) … n(v) • Join is disjunction • Model checking • Partially disjunctive school • CNF-like invariants • 1(v) … n(v) • v : (v) • Loss of information in join • Abstract Interpretation • Dataflow
Research problem • Problem • Disjunctive shape abstractions • Join is disjunction • “Too” precise • Yields exponential blow-ups • My solution • Partially disjunctive shape abstractions • Modular aspects • Precise enough • Reduce exponential factors • Orders of magnitude speed-ups 12
Main challenge Can we develop useful partially disjunctive abstractions for the heap? Challenging setting: objects & threads are • Anonymous • Unbounded 13
Main thesis results Framework for partially disjunctive heap abstractions based on heap decomposition Correlations within subheap maintained Disjunctive maintains correlations in full heap Correlations with other parts of heap abstracted away Smaller subheaps lead to reduced state space Reuse subheaps System for automatically generating abstract interpreters based on user-specified heap decomposition [SAS 08] Guaranteed soundness Feasible transformers Applications Sequential programs manipulating multiple data structures [TACAS 07] First automatic proof of linearizability for fine-grained concurrent programs with an unbounded number of threads [CAV 08] 14
Outline • Disjunctive vs. partially disjunctive abstractions • Partially disjunctive abstraction via heap decomposition • “Thread-modular” analysis for fine-grained concurrency
Abstraction by partitioning concrete state space
Abstraction by partitioning abstract state space Abstract state:(4x52y3) (2x34y5) (5x64y5) (3x45y6) (7x84y5) (5x67y8) (7x86y7)
Disjunctive abstraction (4x52y3) (2x34y5) (5x64y5) (3x45y6) (7x84y5) (5x67y8) (7x86y7)
Disjunctive abstraction (4x52y3) (2x34y5) (5x64y5) (3x45y6) (7x84y5) (5x67y8) (7x86y7)
2x6 2y6 5x8 4y8 Partially disjunctive abstraction
Partially disjunctive abstraction 2x8 2y8 Join coarser than disjunction
No information loss from merging control flow paths Exponential space blowups Inferring abstractions CEGAR is well-studied Loses many correlations Drastically reduces state space What are the important correlations? Learning from multiple traces Partially disjunctive abstractions Disjunctive abstractions
Partially disjunctive abstraction and the heap • Scaling is major issue • Infinite state space • Existing abstractions are doubly-exponential • Concurrency drastically increases analysis cost • Cartesian abstraction for heap non-trivial • Storeless semantics • Address of objects and threads coincidental • Unbounded memory and number of threads • Which parts of the heap need to be correlated? • CEGAR-like techniques may help • The cost of analysis may sometimes increase when more correlations are ignored
p p z z x x y y q q Heap Decomposition idea
p p x x y y z z q q Heap Decomposition idea
p q p x y q p p p p p z z z z x x x y y y q q q x x x y y y z z z q q Heap Decomposition idea
x y z x x y y z z x x y y z z Disjunctive shape analysis … t xn = y yn = z assert xnn z
x y z x y z x y z x y z Independent attribute (Cartesian)shape analysis[Sagiv et al., TOPLAS 98] … t xn = y yn = z assert xnn z
Space of heap abstractions moreprecise Independent attribute shape analysis[Sagiv et al., TOPLAS 98] Connected component decomposition [M. et al., TACAS 07] Disjunctive shape abstractionsCanonical heap abstraction [Sagiv et al., TOPLAS 02](cyclic) list abstraction [M. et al., VMCAI 05]
x y z Connected components decomposition • Abstraction splits heap into set of (disjoint) connected components • Special case of heap decomposition • What is maintained • All aliasing relations • All reachability relations • What is lost • Correlations between data structures
x y z x y z decompose Connected components decomposition • Abstraction splits heap into set of (disjoint) connected components • Special case of heap decomposition • What is maintained • All aliasing relations • All reachability relations • What is lost • Correlations between data structures
x z y z x y ? ? ? ? ? ? What a connected component denotes y x z ConnComp({x,y,z}) noedges() ConnComp({x,y,z}) noedges() ConnComp({x,y,z}) noedges()
x y z x y z What a set of components denotes ConnComp({x,y,z}) edges{(y,z)} ConnComp({x,y,z}) noedges() ConnComp({x,y,z}) edges{(x,y)} ConnComp({x,y,z}) noedges()
x y z x y z x y z x y z What a set of components denotes • Meaning of a set of subheaps:full heaps composed from subheaps
x y z x y z x z x x y z x y z z What a set of components denotes ConnComp({x,y,z}) ConnComp({x,y,z}) noedges() • Meaning of a set of subheaps:full heaps composed from subheaps • Full heap contains all variables
x y z x y z x y y z z x x y y z What a set of components denotes ConnComp({x,y,z}) ConnComp({x, y,z}) edges{(x,y), (y,z)} ConnComp({x,y,z}) edges{(x,y)} ConnComp({x,y,z}) edges{(y,z)} • Meaning of a set of subheaps:full heaps composed from subheaps • Full heap contains all variables • Subheaps with common variables inconsistent
x y z x y z x y z x x y y z z x x y y z z Shape example revisited … x.n = y y.n = z assert xnn z
h1 t1 h2 t2 h1 t1 ... h2 t2 ... Correlations betweenproperties of two lists irrelevant for proving loop invariant Example: multiple data structures // @assume h1!=NULL && h1==t1 && h1n==NULL &&// h2!=NULL && h2==t2 && h2n==NULL//// @loop_invariant Reach(h1,t1) &&// Reach(h2,t2) &&// DisjointLists(h1,h2)EnqueueEvents() {L1: while (...) { List temp = new List(getEvent()); if (nondet()) { t1n = temp; t1 = temp; } else { t2n = temp; t2 = temp; } }} Idea: track properties of each list independently
size=1 size=1 size=2 size=2 size>2 size>2 Full abstract heaps at loop head h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 1 1 1 h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 >1 >1 >1
Common subgraphs h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 1 1 1 h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 >1 >1 >1
Common subgraphs h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 1 1 1 h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 >1 >1 >1
State space reduction h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 1 >1 Connected components abstraction precise enough to prove invariantreusing subgraphs reduces exponential blow-ups For k lists:full heap abstraction generates 3k abstract statesdecomposed heap abstraction generates 3×k abstract states
Transformers for connected components decomposition • Most precise transformer NP-complete • Developed efficient transformers • Polynomial • Compose at most 2-3 subgraphs • Useful • Applied to windows device drivers • x200 speedup
Example with bug // @assume h1!=NULL && h1==t1 && h1n==NULL &&// h2!=NULL && h2==t2 && h2n==NULL//// @loop_invariant Reach(h1,t1) &&// Reach(h2,t2) &&// DisjointLists(h1,h2)EnqueueEvents() {L1: while (...) { List temp = new List(getEvent()); if (nondet()) { t1n = temp; t1 = temp; } else { t2n = temp;t1 = temp; // should be t2 = tmp; } }}
h1 t1 h2 t2 h1 t1 h2 t2 temp h1 t1 h2 t2 temp h1 h2 t2 temp t1 Abstract error trace List temp = new List(getEvent()); t2n = temp; t1 = temp; Reach(h1,t1)
State space reduction number of shape graphsnumber of subgraphs x (89,430 / 7,733)
Time speedup full shape graph analysis time graph decomposition analysis time x (552.6 / 2.6)
Beyond connected component decomposition • Realistic programs contain complex connected components • Connected component decomposition too coarse • Multithreadingintroduces relations between threads and objects • Different threads can access same object • Sometimes need correlations across connected components • Need more general decompositions
HeDec: system for Heap Decomposition • Parametric: allows experimenting with different decompositions • Analysis designer specifies decomposition • Subheaps not necessarily disjoint • Applicable for states with threads • Soundness automatically guaranteed for • Any decomposition specification • Any transformer specification