260 likes | 412 Views
Generating Analyses for Detecting Faults in Path Segments. Wei Le* and Mary Lou Soffa University of Virginia. *currently with Rochester Institute of Technology. Motivation. Static analysis: an integral part of fault detection High code coverage No executables required
E N D
Generating Analyses for Detecting Faults in Path Segments Wei Le* and Mary Lou Soffa University of Virginia *currently with Rochester Institute of Technology
Motivation • Static analysis: an integral part of fault detection • High code coverage • No executables required • Find faults early, so cheaper to fix
Challenges of Current Static Analysis Precision many false positives and little support for diagnosis Scalability manual annotations sometimes required Generality hardcode heuristics, new tools for different types of faults Important to achieve all three
Precision: Path-Sensitive Analyses Heuristics based: ESP[das02] (based on an assumption of typestate fault) Summary based: Saturn[xie07] (lack of interprocedual path-sensitivity) Partially exploring the state space: Prefix[bush00] exhaustive analysis based on the structure of a program
Framework: Athena • Automatically generate analyses from specifications: • precise: low false positives and rich diagnostic info • interprocedural path-sensitive analysis • reports path-segments of a fault • scalable: only covers code relevant to the fault • demand-driven analysis • general: data- and control-centric, liveness and safety • a specification technique and a generation algorithm
Faults • Commonality of the faults - Generality • The violations are always observable at certain statements • We are able to construct constraints to express violations • Locality of a fault - Scalability • Only the segments along the paths that are relevant to the fault • Only a limited number of statements on the paths that contribute to the fault • Fault locality holds for a variety of the faults
Athena: Components Generate Analyses Specification Language Specification Repository Parser Analyzer Generator Precision and Scalability of the Analyses Path-Sensitive Demand-Driven Template
Athena: Workflow Step 1: Specifying Faults Step 2: Generating Analysis Definition of a Fault Information for Detecting the Fault Syntax trees Code modules Analyzer for the Spec Parser Analyzer Generator Demand-Driven Template Spec Step 3: Analyzing programs with Generated Analysis Infeasible Safe Faulty (severity, root cause) Don’t-know Path Segment Generated Analysis Program Path Classification
Components I: Specification and Language Generate Analyses Specification Language Specification Repository Parser Analyzer Generator • Spec: <program point, constraints> <program point, actions> • Language: attributesand operatorson attributes • Attributes – abstractions on program objects, e.g. len(s) • Operators – comparison (>,<), computation (+, -), command (:=) Precision and Scalability of the Analyses Path-Sensitive Demand-Driven Template
Grammar of the Language Specification→Vars VarList DefineFault FaultSigList DetectFault DetectSigList VarList → Var* Var → VarType namelist; VarType →Vbuffer|Vint|Vany|Vptr|... FaultSigList → FaultSigItem <or FaultSigItem>* DetectSigList → DetectSigItem <or DetectSigItem>* |# include < ExistentSpec > FaultSigItem →CodeSignature ProgramPoint S-Constraint Condition| CodeSignature ProgramPoint L-Constraint Condition DetectSigItem →CodeSignature ProgramPoint Update Action ProgramPoint → $LangSyntax$|Condition|$LangSyntax$&&Condition Condition → Attribute Comparator Attribute|!Condition|[Condition]| Condition&&Condition|Condition || Condition Action → Attribute:=Attribute| ^ Condition|[Action]|Action&&Action|Action || Action|Condition→ Action Attribute → PrimitiveAttribute(var, ...)|Constant|!Attribute|¬Attribute|[Attribute]| Attribute ° Attribute|Attribute Op Attribute|min(Attribute,Attribute)|[Attribute,Attribute] PrimitiveAttribute →Size|Len|Value|MatchOperand|TMax|TMin|... Constant →0|true|false|... Comparator → = | | > | < | | | | Op → +| − | * | |
Grammar of the Language Specification→Vars VarList DefineFault FaultSigList DetectFault DetectSigList VarList → Var* Var → VarType namelist; VarType →Vbuffer|Vint|Vany|Vptr|... FaultSigList → FaultSigItem <or FaultSigItem>* DetectSigList → DetectSigItem <or DetectSigItem>* |# include < ExistentSpec > FaultSigItem →CodeSignature ProgramPoint S-Constraint Condition| CodeSignature ProgramPoint L-Constraint Condition DetectSigItem →CodeSignature ProgramPoint Update Action ProgramPoint → $LangSyntax$|Condition|$LangSyntax$&&Condition Condition → Attribute Comparator Attribute|!Condition|[Condition]| Condition&&Condition|Condition || Condition Action → Attribute:=Attribute| ^ Condition|[Action]|Action&&Action|Action || Action|Condition→ Action Attribute → PrimitiveAttribute(var, ...)|Constant|!Attribute|¬Attribute|[Attribute]| Attribute ° Attribute|Attribute Op Attribute|min(Attribute,Attribute)|[Attribute,Attribute] PrimitiveAttribute →Size|Len|Value|MatchOperand|TMax|TMin|... Constant →0|true|false|... Comparator → = | | > | < | | | | Operators → +| − | * | |
Specification Buffer Overflow Specification 12
Component II: Demand-Driven Template • Formulate fault detection problems into queries about program facts, e.g., variable relations • Scalable: Buffer overflow detection [le08] Generate Analyses Specification Language Specification Repository Parser Analyzer Generator Precision and Scalability of the Analyses Path-Sensitive Demand-Driven Template
Demand-Driven Template Program Resolution bar() 1 80/8>=len(t) len(t)<8 : safe Raise Queries no s = (char*)malloc(80) 2 Propagate Queries size(s)>= len(t) len(t) < 8 x[10] = ‘0’ 3 Update Queries Safe size(s)>= len(t) len(t) < 8 strlen(t) < 8 4 yes Evaluate Queries yes no 5 strcpy(s,t) strcat(x,t) 6 Query size(s)>= len(t)
Demand-Driven Template Program • Rules for Propagating Query • Interprocedural, path-sensitive, context-sensitive • Branch, loop, call, infeasible path • Evaluating Queries (integer constraints) • Algebra rules, inequalities • Integer constraint solver Raise Queries no Propagate Queries Update Queries yes Evaluate Queries
Components III: Parser and Code Generator Generate Analyses Specification Language Specification Repository Parser Analyzer Generator Precision and Scalability of the Analyses Path-Sensitive Demand-Driven Template
Parsing Specification (YACC) CodeSignature: GetOp(s) = strcpy S_Constraint: Size(Src1(s)) Len(Src2(s)) Non-leaf: Operator CodeSignature, S_Constraint A B = GetOp strcpy º º Size Src1 Len Src2 Leaf: attribute
Code Generation Code Signature int GetOp (statement t) { C_Syntax(t); return t.opcode; } Find the function that implements the semantics of leaf attributes = GetOp strcpy Construct a function that implements the semantics of the tree based on the semantics of operators bool IsStrcpy(statement t){ if (GetOp(t)==“strcpy”) return true; else return false; } Create the instance of the call IsStrcpy(n)
Generating Analysis Syntax trees Code modules Analyzer for the Spec Parser Analyzer Generator Demand-Driven Template Spec Code Module Generated Demand-Driven Template Raise Queries if(isnode(s)) q= raiseQ(s) if(isnode(s)) q= raiseQ(s) Propagate Queries no Update Queries if(isnode(s)) updateQ(q) if(isnode(s)) updateQ(q) Evaluate Queries yes 19
Experimental Setup • Athena (analyze C/C++/C#) – YACC, Phoenix and Disolver 20
Can WeGenerate Analyses for Different Faults? • Detection: 84 faults of four types from 9 benchmarks, 68 new • False positive/negative: 18 false positives, missed 3 • Path segments: generally relevant to 1-4 procedures; maximum 35 procedures • Scalability: apache (268.9 k) – 4 hours and ffmpeg (48.1 k) – 2.3 hours New faults: many located along the same paths; dynamic tools would halt Main source of imprecision: infeasible paths and pointers Locality helped achieve the scalability; without guidance, manual inspection is hard Code complexity matters; Generality does compromise scalability, but still scalable 21
Comparable with Manually Customized Detectors? Heuristics designed for suppressing false positives may adversely hurt detection rate • Lack interprocedural path-sensitivity • Heuristics of applying consistency rules 22
Related Work • Static fault detection: type based, model checking, data flow analysis • Path-sensitive fault detection: Prefix, Metal, ESP, Archer, Saturn, Calysto – exhaustive based static analysis • Athena is demand-driven, more precise, scalable and general • Slicing and other demand-driven analyses • Athena first uses it for computing path segments of faults 23
Conclusions • Athena - generates demand-driven, path-based, symbolic analysis for detecting specified faults: • Faults are developed along paths, but manifest locality, thus demand-driven, path-based analysis is more precise and scalable • Specification provides a way of mapping fault detection problems to constraints on program objects at the program points • To specify different faults, the required attributes are limited, and the expression power comes from the composition of the attributes
Branch Analysis Fault Detection p[10] Len(t)<10 IsEntry(t) 10 Len(t) [Safe] 1 scanf(%s, t) Len(t)<10 IsEntry(t) Size(p) Len(t) 2 Feasible Len(t)<10 Size(p) Len(t) i = strlen(t) 3 Len(t) < 10 i <10 4 Value(i) < 10 yes Size(p) Len(t) strcpy(p,t) 5