250 likes | 388 Views
Taming False Alarms from a Domain-Unaware C Analyzer by a Statistical Post Analysis. Yungbum Jung, Jaehwang Kim, Jaeho Shin , Kwangkeun Yi Programming Research Lab. Seoul National University. Motivation : an Industry’s Challenge.
E N D
Taming False Alarms from a Domain-Unaware C Analyzerby a Statistical Post Analysis Yungbum Jung, Jaehwang Kim,Jaeho Shin, Kwangkeun Yi Programming Research Lab.Seoul National University
Motivation: an Industry’s Challenge • In 2004, a company’s SQA dept. asked us for a C buffer-overrun static analyzer that • must besound • must have a reasonable cost • must be domain-unaware • Our path • Sound analyzer: drive cost-accuracy balance to a limit • Statistical filter: sift out inevitable false alarms and rank alarms by their true probabilities
Outline • Airac, Our Analyzer • Internals • Performance • Statistical Analysis • Symptoms • Models • Bayesian Analysis • Linear Logistic Regression • Sifting out, Ranking
Airac • Array Index Range Analyzer for C • Our static analyzer • Is an abstract interpreter • Does numerical interval analysis • Is sound • in sense of detecting all possible buffer overruns • Covers full ANSI C + some GNU extensions
α Abstraction Set of concrete machine transition traces Map from program points to abstract states PgmPt State • Usual abstraction for stateful programs
Abstract Domains • Machine = State x PgmPt • State = Stk x Mem x Dmp • Mem = Addr Val • Val = Interval x 2Addr x 2Array • Addr = PgmVar + AllocSite + AllocSite x Field • Array = AllocSite x Base x Size • AllocSite = PgmPt • [a, b] ∈ Interval = Base = Size ...
Techniques Used • Accuracy improvement by • narrowing after widening • flow-sensitivity • context pruning (limited to linear expressions) • static inlining (parameterized) • static loop unrolling (parameterized) • Cost reduction by • careful worklist order: lazy at join points • selective join/compare • stack obviation
Stack Obviation • Size of Stk proportional to program size • Most of the analysis time = join + compare • OK to skip join/compare for Stk • if changes of Stk always reflected on Mem • By simple syntactic transformation • e1 ? e2 : e3 { if (e1) t = e2 else t = e3; t } • e[f()] t = f(); e[t] • 3~5 times speed up
Error Recovery During Analysis 1: int a[10], i, j; 2: for (i=0;i<10;i++) { 3: a[i] =2 * i; 4: } 5: j = a[i]; 6: a[i] = … … buffer overrunsince i [10, 10] Optimistic Assumption: i[0, 9]j[0, 18]
Warnings about Performance • Assume typeful C programs • arrays must be used as the same type declared • Artificial semantics after errors • e.g. overrun, null dereference • No side-effect for library functions • No main() then • analyze procedures in their defined order • No alarms about buffers whose size is top • Top value for free variables
Performance 1/2 Performed on a Linux 2.6 box with Pentium4 3.2GHz, 4GB RAM
Statistical Post Analysis • We collect • Samples of true and false alarm • Symptoms of each alarm • From them, compute trueness of alarms • i.e. probability being true given its symptoms • With trueness we can • Sift out false alarms • Report truer alarms first
g h f Symptoms • Syntactic symptoms • AfterLoop, AfterBranch, AfterReturn, InNestedLoopBody, InNestedBranchBody • InLoopCond, InBranchCond, InFunParam, InNestedFunParam, InRightOfAnd • Semantic symptoms • JoinN, NotNarrowed, ComplexData, InCyclicCallChain • Prunning, PassedValue, ConstantVariable, ConstantIndex, ConstantArrayConstantIndex • Result symptoms • TopIndex, HalfInfiniteIndex • FiniteOffsetFiniteArray, FiniteIndex • Common-sense + shallow inside info [9, 10]
Bayesian Analysis • For each alarm, we compute its conditional probability being true given its symptoms • Numbers from “learning samples” • Estimated using Monte-Carlo method We assume symptoms occur independently (naïve Bayesian filtering)
Sifting Out Threshold • User’s knob: his/her risk ratio (Rs/Rr) • Minimize risk expectation • Risk expectation of an alarm with probability p when • Silencing = Rs x p • Reporting = Rr x (1 – p) • We silence if Rs x p < Rr x (1 – p) • Hence, sift out when p < Rr/ (Rr + Rs) = 1 / (1 +Rs/Rr)
Experiments • With alarms from • Parts of the Linux kernel • Programs in algorithm text-books • Learning and testing • 50%/50% randomly chosen • 15 times repeated
Sifting Out Alarms • Rs = 3 x Rr threshold = 0.25 • 74.84% of false alarms filtered out :-) • 31.40% of true alarms were also swept out :-(
Ranking Alarms • Show user “truer” alarms first • 15.17%of false alarms are mixed upuntil the user sees 50% of the true alarms
Binary Logistic Regression • Trueness of an alarm given its binary symptom vector • Generalized linear model • Coefficients from learning set • For example,
Bayesian vs. Logistic Regression 1/2 • With threshold 0.25, • Bayesian: 74.84% of false, 31.40% of true • Logistic Regression: 90.05% of false, 20.85% of true alarms can be sifted out
Bayesian vs. Logistic Regression 2/2 • Until user sees 50%of true alarms • Bayesian: 15.17% • Logistic Regression: 4.10% of false alarms were mixed up Conjecture:Logistic regression model respects symptom dependency?
unsound requireannotation domain-aware Related Work • Buffer overrun detection • ARCHER [Xie, Chou & Engler 2003] • SPLINT [Zitser, Lippmann & Leek 2004] • CSSV [Dor, Rodeh & Sagiv 2003] • ASTRÉE [Cousot et al. 2005, 2003] • Statistical approach • Z-ranking [Kremenek & Engler 2003] • Error Correlation [Kremenek et al. 2004]
Conclusion • Our “sound” static analyzer,Airac is realistic • False alarms are inevitablein domain-unaware situation • Statistical approaches helped • viable approach to handle false alarms • natural symptoms seem to work • orthogonal to other static analysis techniques • generic, depends on learning set
Thank you • Questions? • Demo available at • http://ropas.snu.ac.kr/airac