110 likes | 237 Views
In Defense of Probabilistic Static Analysis. Ben Livshits Shuvendu Lahiri Microsoft Research. From the people who brought you soundiness.org…. Static analysis: Uneasy Tradeoffs. too imprecise. does not scale. useless results. overkill for some things. may not scale.
E N D
In Defense of Probabilistic Static Analysis Ben Livshits Shuvendu Lahiri Microsoft Research
Static analysis: Uneasy Tradeoffs too imprecise does not scale useless results overkill for some things may not scale possibly still too imprecise for others
What is missing is analysis elasticity
Our approach is probabilistic treatment Points-to(p, v, h) • Many interpretations are possible • Our certainty in the fact based on static evidence such as program structure • Our certainty based on runtime observations • Our certainly based on priors obtained from this or other programs Object x = new Object(); try { } catch(...){ x = null; } if(...){ // branch direction info x = new Object(); }else{ x =null; } $(‘mydiv’).css(‘color’:’red’);
Benefits RESULT PRIORITIZATION ANALYSIS DEBUGGING Program points or even static analysis inference rules and facts leading to imprecision can be identified with the help of backward propagation • Static analysis results can be naturally ranked or prioritized in terms of certainty, nearly a requirement in a situation where analysis users are frequently flooded with results
More benefits HARD AND SOFT RULES INFUSING WITH PRIORS End-quality of analysis results can often be improved by do- main knowledge such as information about variable naming, check-in information from source control repositories, bug fix data from bug repositories, etc. • In an effort to make their analysis fully sound, analysis designers often combine certain inference rules with those that cover generally unlikely cases to maintain soundness • Naturally blending such inference rules together, by giving high probabilities to the former and low probabilities to the latter allows us to balance soundness and utility considerations
SIMPLE analysis in Datalog • // transitive flow propagation • FLOW(x,z) :- FLOW(x,y), ASSIGN(y,z) • FLOW(a,c) :- FLOW(a,b), ASSIGNCOND(b,c) • FLOW(x,x). • // nullable variables • NULLABLE(x) :- FLOW(x,y), ISNULL(y) • // error detection • ERROR(a) :- ISNULL(a), DEREF(a) • ERROR(a) :- !ISNULL(a), NULLABLE(a), DEREF(a) • x=3; • y=null; • z=null; • z=x; • if(...){ • z=null; • y=5; • } • w=*z
Relaxing the rules • // transitive flow propagation • FLOW(x,z) :- FLOW(x,y), ASSIGN(y,z). • FLOW(a,c) :- FLOW(a,b), ASSIGNCOND(b,c). • FLOW(x,x). • // transitive flow propagation • FLOW(x,y) ^ ASSIGN(y,z) => FLOW(x,z). • 1FLOW(a,b) ^ ASSIGNCOND(b,c) => FLOW(a,c) • FLOW(x,x). • // nullable variables • NULLABLE(x) :- FLOW(x,y), ISNULL(y). • // nullable variables • FLOW(x,y) ^ ISNULL(y) => NULLABLE(x). • // error detection • ERROR(a) :- ISNULL(a), DEREF(a). • ERROR(a) :- !ISNULL(a), NULLABLE(a), DEREF(a). • // error detection • ISNULL(a)^ DEREF(a) => ERROR(a). • 0.5 !ISNULL(a) ^ NULLABLE(a) ^ DEREF(a) => ERROR(a). • // priors and shaping distributions • 3 !FLOW(x,y).
Probabilistic Inference with Alchemy X1 Y1 U1 W1 Z1 Z2 W4 • Tuning the rules • Tuning the weights • Semantics are not as obvious • Shaping priors is non-trivial, but fruitful Z3 W3 0.567993 0.614989 W5 0.616988 W6 W7 W8 W9 W10 W11 0.544996 0.560994
Challenges • Learning the weights • Expert users • Learning (need labeled dataset) • What class of static analysis can be made elastic? • Datalog • Abstract interpretation • Decision procedure (SMT)-based