Heap Decomposition for Concurrent Shape Analysis

Heap Decompositionfor Concurrent Shape Analysis R. ManevichT. Lev-AmiM. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine MSR Cambridge Dagstuhl 08061, February 7, 2008

Thread modular analysisfor coarse-grained concurrency • E.g., [Qadeer & Flanagan, SPIN’03][Gotsman et al., PLDI’07] … • With each lock lk • subheap h(lk) • Partition heapH = h(lk1) *…* h(lkn) • local invariant I(lk)inferred/specified • When thread t • acquires lk it assumes I(lk) • releases lk it ensures I(lk) • Can analyze each thread “separately” • Avoid explicitly enumerating all thread interleavings

Thread modular analysisfor fine-grained concurrency? • CAS (Compare And Swap) • No locks means more interference between threads • No nice heap partitioning • Still idea of reasoning about threads separately appealing CAS CAS CAS CAS

Overview • State space is too large for two reasons • Unbounded number of objects  infinite • Apply finitary abstractions to data structures (e.g., abstract away length of list) • Exponential in the number of threads • Observation: • Threads operate on part of state • Correlations between different substates often irrelevant to prove safety properties • Our approach: develop abstraction for substates • Abstract away correlations between substates of different threads • Reduce exponential state space

Non-blocking stack [Treiber 1986] [1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] } #define EMPTY -1typedef int data type;typedef struct node t { data type d; struct node t *n;} Node;typedef struct stack t { struct node t *Top;} Stack; [9] data_type pop(Stack *S){[10] do {[11] Node *t = S->Top;[12] if (t == NULL)[13] return EMPTY;[14] Node *s = t->n;[15] data_type r = s->d;[16] } while (!CAS(&S->Top,t,s));[17] return r;[18] }

t n x Example: successful push [1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] } Top n n

Top t n x Example: successful push [1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] } CAS succeeds = n n

Top t n n x Example: unsuccessful push CAS fails [1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] }  n n

Concrete states with storable threads thread object:name +program location prod1 Top cons1 t pc=7 x t pc=14 n n prod2 cons2 t s pc=16 pc=6 n x t n local variable next field of list

Full state S1 prod1 Top cons1 t pc=7 x t pc=14 n n prod2 cons2 t s pc=16 pc=6 n x t n

Decomposition(S1) M1 M2 M3 M4 prod1 Top A substate represents all full states that contain it cons1 Top Top Top t pc=7 t pc=14 x n n n prod2 n cons2 Decomposition isstate-sensitive(depends on values of pointers and heap connectivity) n t s pc=6 pc=16 n n n x n t n Decomposition(S1) = M1  M2  M3  M4 Note that S1Decomposition(S1)

Full states S1  S2 S2 S1 prod1 prod2 Top Top cons1 cons2 t t pc=7 pc=7 x x t t pc=14 pc=14 n n n n prod2 prod1 cons2 cons1 t t s s pc=16 pc=16 pc=6 pc=6 n n x x t t n n

pc=16 pc=14 pc=16 pc=14 pc=7 pc=6 pc=6 pc=7 Decomposition(S1  S2)improve explanation prod1 Top M4 Top cons1 M2 Top Top t t cons2 prod2 n n n x n t s n n n n t n x M3 n n    M1 Decomposition(S1S2) = (M1K1)  (M2K2)  (M3K3)  (M4K4) prod2 Top Top cons2 (S1S2)  Decomposition(S1S2)Cartesian abstraction ignorescorrelations between substates Top Top t K1 prod1 K3 n t cons1 x n t n n n n n s x K4 n State space exponentially more compact n t n K2 n

Abstraction properties • Substates in each subdomain correspond to a single thread • Abstract away correlations between threads • Exponential reduction of state space • Substates preserve information on part of heap (relevant to one thread) • Substates may overlap • Useful for reasoning about programs withfine-grained concurrency • Better approximate interference between threads

Main results • New parametric abstraction for heaps • Heap decomposition + Cartesian abstraction • Parametric in underlying abstraction + decomposition • Parametric sound transformers • Allows balancing efficiency and precision • Implementation in HeDec • Heap Decomposition + Canonical Abstraction • Used to prove interesting properties of heap-manipulating programs with fine-grained concurrency • Linearizability • Analysis scales linearly in number of threads

Sound transformers {XHj2}j2 {Xj4}j4 {XHj1}j1 {XHj3}j3 # {YHj2’}j2’ {YHj4’}j4’ {YHj1’}j1’ {YHj3’}j3’

{XHj2}j2 {XHj4}j4 {XHj1}j1 {XHj3}j3 # # # # {YHj2’}j2’ {YHj4’}j4’ {YHj1’}j1’ {YHj3’}j3’ Pointwise transformers efficient often too imprecise

pc=6 Imprecision example [1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] } Top # : schedules prod1 and executes x->n=t M2 prod2 n t n n x But where do x and t of prod1 point to?

prod2 prod2 Top Top cons1 t t # pc=7 pc=7 x x t pc=14 n n n n prod1 cons2 t s pc=16 pc=6 n n x t n Imprecision example [1] void push(Stack *S, data_type v) {[2] Node *x = alloc(sizeof(Node));[3] x->d = v;[4] do {[5] Node *t = S->Top;[6] x->n = t;[7] } while (!CAS(&S->Top,t,x));[8] } false alarm:possible cyclic list

{XHj2}j2 {XHj4}j4 {XHj1}j1 {XHj3}j3 {XHj1}{XHj1}{XHj1}{XHj1} # #({XHj1}{XHj2}{XHj3}{XHj4}) {YHj2’}j2’ {YHj4’}j4’ {YHj1’}j1’ {YHj3’}j3’ Full composition transformers exponential space blow-up precise

{XHj2}j2 {XHj4}j4 {XHj1}j1 {XHj3}j3 {XHj1}{XHj2} {XHj1}{XHj3} {XHj1}{XHj4} Partial composition

{XHj1}{XHj2} {XHj1}{XHj3} {XHj1}{XHj4} # # # #({XHj1}{XHj2}) #({XHj1}{XHj3}) #({XHj1}{XHj4}) {YHj2’}j2’ {YHj4’}j4’ {YHj1’}j1’ {YHj3’}j3’ Partial composition efficient and precise

{XHj1}{XHj2} pc=7 pc=6 pc=6 pc=7 Partial composition example prod1 Top M2 Top t prod2 n x n t n n n x n  M1 prod2 Top Top t K1 prod1 n x t n n n n x n K2

{XHj2}j2 {XHj1}j1 {XHj1}{XHj2} pc=6 pc=7 Partial composition example K2M1 K2k1 false alarm avoided prod2 prod2 Top Top t t pc=7 pc=7 x x n n n prod1 n prod1 t t pc=7 n n n x x

Experimental results • List-based fine-grained algorithms • Non-blocking stack [Treiber 1986] • Non-blocking queue [Doherty and Groves FORTE’04] • Two-lock queue [Michael and Scott PODC’96] • Benign data races • Verified absence of nullderef + mem. Leaks • Verified Linearizability • Analysis built on top of existing full heap analysis of [Amit et al. CAV’07] • Scaled analysis from 2/3 threads to 20 threads • Extended to unbounded threads (different work)

Experimental results • Exponential time/space reduction • Non-blocking stack + linearizability

Related work • Disjoint regions decomposition [TACAS’07] • Fixed decomposition scheme • Most precise transformer is FNP-complete • Partial join • [Manevich et al. SAS’04] • Orthogonal to decomposition • In HeDec we combine decomposition + partial join • [Yang et al.] • Handling concurrency for an unbounded number of threads • Thread-modular analysis [Gotsman et al. PLDI’07] • Rely-guarantee [Vafeadis et al. CAV’07] • Thread quantification (submitted)

More related work • Local transformers • Works by Reynolds, O’Hearn, Berdine, Yang, Gotsman, Calcagno • Heap analysis by separation[Yahav & Ramalingam PLDI’04] [Hackett & Rugina POPL’05] • Decompose verification problem itself and conservatively approximate contexts • Heap decomposition for interprocedural analysis [Rinetzky et al. POPL’05] [Rinetzky et al. SAS’05] [Gotsman et al. SAS’06] [Gotsman et al. PLDI’07] • Decompose/compose at procedure boundaries • Predicate/variable clustering [Clark et al. CAV’00] • Statically-determined decomposition

Conclusion • Parametric framework for shape analysis • Scaling analyses of program with fine-grained concurrency • Generalizes thread-modular analysis • Key idea: state decomposition • Also useful for sequential programs • Used prove intricate properties like linearizability • HeDec tool • http://www.cs.tau.ac.il/~tvla#HEDEC

Future/ongoing work • Extended analysis for an unbounded number of threads via thread quantification • Orthogonal technique • Both techniques compose very well • Can we automatically infer good decompositions? • Can we automatically tune transformers? • Can we ruse ideas to non-shape analyses?

Invited questions • How do you choose a decomposition? • How do you choose transformers? • How does it compare to separation logic? • What is a general principle and what is specific to shape analysis? • Caveats / limitations?

How do you choose a decomposition? • In general this an open problem • Perhaps ctrex. refinement can help • Depends on property you want to prove • Aim at causes of combinatorial explosion • Threads • Iterators • For linearizability we used • For each thread t • Thread node, objects referenced by local variables, objects referenced by global variables • Objects referenced by global variables and objects correlated with seq. execution • Locks component: for each lock thread that acquires it

How do you choose transformers? • In general challenging problem • Have to balance efficiency and precision • Have some heuristics • Core subdomains

How does it compare to separation logic? • Relevant separating conjunction *r • Like * but without the disjointness requirement • Do you have an analog of the frame rule? • For disjoint regions decomposition [TACAS’07] • In general no, but instead we can use transformers of different level of precision#(I1  I2) = #precise(I1) #less-precise(I2)where #less-precise is cheap to compute • Perhaps can find conditions for which#(I1  I2) = #precise(I1)  I2 • Relativized formulae

What is a general principle and what is specific to shape analysis? • Decomposing abstract domains is general • Substate abstraction + Cartesian product • Parametric transformers for Cartesian abstractions is general • Chopping down heaps by heterogeneous abstractions is shape-analysis specific

Caveats / limitations? • Decomposition + transformers defined by user • Not specialized for program/property • Too much overlap between substates can lead to more expensive analyses • Too fine decomposition requires lots of composition • Partial composition is a bottle neck • We have the theory for finer grained compositions + incremental transformers but no implementation • Instantiated framework for just one abstraction (Canonical Abstraction) • Can this be useful for separation logic-based analyzers?

Heap Decomposition for Concurrent Shape Analysis