Shape Analysis by Graph Decomposition

Shape Analysisby Graph Decomposition R. Manevich M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine B. Cook MSR Cambridge

Motivation • Challenge: precise and efficient shape analyses • Prove properties of dynamically allocated linked data structures • Observation: often many correlations irrelevant for proving shape properties • Our approach: develop a flexible abstraction that takes advantage of this

h1 t1 h2 t2 h1 t1 ... h2 t2 ... Correlation between two lists irrelevant for proving loop invariant Example program – 2 lists // @assume h1!=null && h1==t1 && h1.n==null &&// h2!=null && h2==t2 && h2.n==null//// @loop_invariant Reach(h1,t1) &&// Reach(h2,t2) &&// DisjointLists(h1,h2)EnqueueEvents() {L1: while (...) { List temp = new List(getEvent()); if (nondet()) { t1.n = temp; t1 = temp; } else { t2.n = temp; t2 = temp; } }}

size=1 size=1 size=2 size=2 size>2 size>2 Abstract states - full heaps [VMCAI’05] h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 1 1 1 h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 >1 >1 >1

h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 h1 t1 h1 t1 >1 h2 t2 h2 t2 1 1 h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 >1 >1 >1 Graph decomposition h1 t1 1 h2 t2 1

Connected component 1 h1 h1 t1 t1 1 1 decompose h2 h2 t2 t2 1 1 Connected component 2 Graph decomposition Connected components by undirected reachability

Abstract states – decomposed heaps h1 t1 h1 t1 h1 t1 1 >1 h2 t2 h2 t2 h2 t2 1 >1 Coarser abstraction precise enough to prove invariantbut generates fewer states For k lists:full heap abstraction generates 3k abstract statesdecomposed heap abstraction generates 3×k abstract states

Overall view Concrete domain:concrete heaps Full heaps domain:shape graphs Decomposed heaps domain:shape subgraphs FH GD h1 t1 h1 t1 h1 t1 >1 >1 ... h2 t2 h2 t2 h2 t2 >1 ... >1 h1 t1 h1 t1 h1 t1 1 1 h2 t2 h2 t2 h2 t2 1 1 FH GD Shape subgraphs trackSOME correlations Shape graphs trackALL correlations

Main results • New abstraction for shape analysis reduces exponential factors by: • Connected component decomposition • Abstracting away null-value correlations • Sound and sufficiently precise transformers • Most precise transformers are FNP-complete • Polynomial time efficient transformers • Sufficiently precise • Implementation and empirical results • Sufficiently precise on set of benchmarks,including Windows device driver models • State space/time reduced by factor of 33/212

Outline • Full heap abstraction [VMCAI’05] • Reference abstraction • Further abstraction by decomposition • Connected component decomposition • Abstracting away null-value correlations(details in paper) • Abstract transformers • Concretization by composition • Experimental results

Full heap abstraction [VMCAI’05] Concrete domain:concrete heaps Full heaps domain:shape graphs Decomposed heaps domain:shape subgraphs FH GD h1 t1 h1 t1 h1 t1 >1 >1 ... h2 t2 h2 t2 h2 t2 >1 ... >1 h1 t1 h1 t1 h1 t1 1 1 h2 t2 h2 t2 h2 t2 1 1 FH GD

Shape graph x >1 >1 1 βFH y >1 Full heap abstraction [VMCAI’05] • Abstraction for singly-linked lists • Basic concepts: • Interruptions (bounded number of) • Uninterrupted list segments (bounded number of) • Abstraction keeps interruptions and abstracts segment lengths to {1,>1} • Result is a shape graph Concrete heap x y FH by point-wiseextension

Graph decomposition abstraction Concrete domain:concrete heaps Full heaps domain:shape graphs Decomposed heaps domain:shape subgraphs FH GD h1 t1 h1 t1 h1 t1 >1 >1 ... h2 t2 h2 t2 h2 t2 >1 ... >1 h1 t1 h1 t1 h1 t1 1 1 h2 t2 h2 t2 h2 t2 1 1 FH GD

Graph decomposition abstraction • Abstraction of shape graphs • Further abstraction over shape graphs • Decouples connected components • Intuitively different components = different logical data structures • Result = set of shape subgraphs

h1 t1 h2 t2 1 h1 t1 >1 GD h2 t2 Connected components decomposition h1 t1 h2 t2 1 h1 t1 >1 h2 t2

Concretization GD Concrete domain:concrete heaps Full heaps domain:shape graphs Decomposed heaps domain:shape subgraphs FH GD h1 t1 h1 t1 h1 t1 >1 >1 ... h2 t2 h2 t2 h2 t2 >1 ... >1 h1 t1 h1 t1 h1 t1 1 1 h2 t2 h2 t2 h2 t2 1 1 GD FH

h1 t1 h1 t1 h2 t2 1 h1 t1 h2 t2 h2 t2 >1 1 GD GD h2 t2 h1 t1 h1 t1 >1 h1 t1 >1 h2 t2 h2 t2 1 Abstracting correlations h1 t1 h2 t2 1 h1 t1 >1 h2 t2

Abstract transformers • Need transformers for program statements • x=new List() • x=null • x=y • x=y.n • x.n=y • assume(x!=y) • assume(x==y) • …

Abstract transformers outline • Induced transformers by concretization(from subgraphs and shape graphs) • Problem: concretization introduces exponential space blow-up • Most precise transformers by partial concretization • Avoids exponential space blow-up • Requires oracle to test strong feasibility • Strong feasibility test NP-complete • Conservative transformers • Give up on strong feasibility test • Avoids exponential time blow-up

GD st GD Most precise transformer [CC’77] Concrete domain:concrete heaps Full heaps domain:shape graphs Decomposed heaps domain:shape subgraphs FH h1 t1 ... h2 t2 ... st h1 t1 h2 t2 FH Problem: concretization is exponential space in worst-case

h2 h2 t2 t2 h1 h1 h1 h1 t1 t1 t1 t1 h1 h1 t1 t1 >1 >1 1 1 Partial concretization • Compose weakly-feasible subgraphs • Subgraphs that do not share any variables • Compose only subgraphs in footprint of statement • Compose at most any 2 or 3 subgraphs

h2 t2 temp h1 t1 temp h1 t1 1 h2 t2 temp temp h1 t1 h1 t1 1 1 1 Transformer example temp h2 t2 h1 t1 h1 t1 1 t1.n = temp t1.n = temp t1.n = temp t1.n = temp

Most precise transformer • Most precise requires strong feasibility test • Check that subgraphs can be extended to include all variables M1 M2 M3 M4 M5 w x x z z y w y Can we extend to havevariable w? x z y

Most precise transformer • Most precise requires strong feasibility test • Check that subgraphs can be extended to include all variables M1 M2 M3 M4 M5 w x x z z y w y Inconsistency: shared variable x x z y

Most precise transformer • Strong feasibility NP-complete • Therefore most precise transformer FNP-complete M1 M2 M3 M4 M5 w x x z z y w y Inconsistency:shared variable y Conclusion: can’t extend with w x z y M1 and M4 are weakly-feasiblebut not strongly-feasible in {M1,…,M5}

Making the transformers efficient • Vanilla transformer inefficient in practice • Incremental transformers • Reuse results of previous iterations • Details in paper • Engineering optimizations • Avoid unnecessarily composing subgraphs • … • Optimized transformers linear time in practice

Prototype implementation • Implemented in Java • Supports assertions • assertReach(x,y) • assertDisjointLists(x,y) • assertAcyclicList(x) • assertCyclicList(x) • assert(x==y) assert(x!=y) • Check cleanness properties • Absence of null derefs • Absence of memory leaks • No misuse of dangling pointers

Experiments – precision • Precision lost in just 2/21 benchmarks • getLast • Unable to prove x points to last cell • Due to imprecise transformer • Can be avoided by simple and efficient heuristics • queue_2_stack • Intentionally constructed • Loss of correlations important to prove property • Same precision as full heap analysis on other benchmarks

Experiments – “standard” suite • Programs operating on 1-2 lists • insert, delete, reverse, merge… • New analysis slightly less efficient • But running times < 0.6 seconds so…

Experiments – multiple lists number of shape graphsnumber of subgraphs x (89,430 / 7,733)

Experiments – multiple lists full shape graph analysis time graph decomposition analysis time x (552.6 / 2.6)

Properties of the abstraction • No loss of precision when connected components represent completely independent lists • Reduces state space exponentially • Loss of precision when mixing abstract statesGD(X1 X2) GD(X1)  GD(X2) • So where is this technique useful?

Related work • Partial isomorphism join [Manevich et al. SAS’04] • Applied in more generic context but does not reduce exponential blow-ups addressed in this paper • Heap analysis by separation[Yahav et al. PLDI’04] [Hackett et al. POPL’05] • Decompose verification problem itself and conservatively approximate contexts • Heap decomposition for interprocedural analysis [Rinetzky et al. POPL’05] [Rinetzky et al. SAS’05] [Gotsman et al. SAS’06] [Gotsman et al. PLDI’07] • Decompose/compose at procedure boundaries • Predicate/variable clustering [Clark et al. CAV’00] • Statically-determined decomposition

Conclusions • New abstraction scheme to control precision/cost trade-off for shape analyses • Efficient algorithms for abstract domain operations • Abstraction • Partial concretization • Transformers • … • Applicable beyond singly-linked lists • E.g., class of graphs supported by Lev-Ami et al. [CAV’06] • Doubly-linked lists • Trees • …

Ongoing work • Extension for concurrent program analysis • Future work: • Tune abstraction by counterexample-guided refinement

Questions?

Conservative transformer • Computes superset of subgraph computed by most precise transformer • Algorithm sketch: • Compose components in footprint of statement • Apply local st on footprint and decompose result • Test consistency instead of strong feasibility • Pass other components as is • Time(st) polynomial in #vars in st • x=null : linear • x.n=y : quadratic • assume(x==y) : cubic

Concretization GD • Maps sets of shape subgraphs to sets of full shape graphs • Mathematically: GD(XG) = {G | β(G)  XG} • Algorithmically: by composing weakly-feasible subgraphs • Subgraphs that do not share any variables • Full shape graph includes all program variables

Shape Analysis by Graph Decomposition