120 likes | 282 Views
CS527 Advanced Topics in Software Engineering Lecture 20, 1 Nov 2007 Lorinc Hever (lhever2). Introduction . The Paper: A System and Language for Building System-Specific, Static Analyses 2002 PLDI Berlin, Germany Static Analysis System for C language Extensible rules
E N D
CS527 Advanced Topics in Software EngineeringLecture 20, 1 Nov 2007Lorinc Hever (lhever2)
Introduction • The Paper: A System and Language for Building System-Specific, Static Analyses • 2002 PLDI Berlin, Germany • Static Analysis System for C language • Extensible rules • No source code annotation • Focusing on important bugs • The technology evolved into a commercial product Coverty Prevent
The Analysis Challenge: 1 simple example • Something easy to evaluate: • x = 1; • y = 1; • assert(x < y ); • But, how can you evaluate the following code snippet? • x = v; • if (x < y) { • y = v; • } • assert(x < y ); • Going down on the x<y true path: • x = v; {no fact} • if (x < y) { {x=v} • y = v; {x=v, x<y} • } • assert( x < y ) {x=v,x<y, y=v} • assert fails since: v<v • Going down on the x<y false path: • x = v; {no fact} • if (x < y) {x=v} • assert( x < y ) {x=v,!(x<y)} • The result is false again. Source: Secure Programming with Static Analysis, Brian Chess, Jacob West Addison-Wesley Professional; (June 29, 2007)
The system and language: xgcc + metal source frontend analyzer report • xgcc: a modified gcc compiler front-end, to generate the abstract syntax tree from the source, drives the analysis • metal: a high level language, describe the checkers to describe the state machines, interpreted by YACC? Source: http://metacomp.stanford.edu/osdi2000/node3.html • High level diagram from the Coverty manual: AST checker 1 checker n
Metal features, simplified • global state: start • variable-specific state: 1 for every variable in the program • Declaration: state decl any_pointer v • Example state: v.freed • patterns: to match source code action, kfree(v), {*v} • transitions: to define state change v.freed: {*v}==>v.stop • state decl any_pointer v; • start: { kfree(v); } ==> v.freed ; • v.freed: { *v } ==> v.stop, { err(“err1"); } • | { kfree(v) } ==> v.stop, { err(“err2"); }
Executing the free checker 1: int contrived(int *p, int *w, int x) { 2: int *q; 3: if(x) 5: { 6: kfree(w); 7: q = p; 8: p = 0; 9: } 10: if(!x) 11: return *w; // safe 12: return *q; // using ’q’ after free! 13:} 14:int contrived_caller (int *w, int x, int *p) { 15: kfree (p); 16: contrived (p, w, x); 17: return *w; // using ’w’ after free! 18:} {} contrived_caller(*p,*w,x) {} kfree(p) {p.freed} contrived(p,w,x) {p.freed} if(x) {x!=0,p.freed} kfree(w) {x!=0,p.freed,w.freed} q=p {x!=0,p.freed,w.freed, q.freed} p=0 {x==0,p.freed} prune {x!=0,w.freed, q.freed} prune {x==0, p.freed} return *w {x!=0,w.freed, q.freed} return *q {p.freed,w.freed} return *w
Intraprocedural analysis • Concept: computes the final state within a single function • Traversal: extensions applied to the AST nodes, visited in order, with Depth-First-Search (DFS) • Transition: in every point check transition rules for the variable • Assumes the extension is deterministic: applying extension to the same program creates the same result. • Cashing, based on changing existing state or adding new state. • Based on the independence condition all state tuples reached in a block combined into a single set • Stop condition: meet over path, when the block summary contains all the tuples that can reach that block along any control path.
Intreprocedural analysis • Concept: Carries state from the caller to the callee and back. • Create the Control Flow Graph (CFG) and determines the entry points. • Dynamic programming approach: extension doesn’t need a finite state space, and only the the states executed which can be reached along the path. • Refine: when the algorithm follows a function call, the passed variable should remain in the same state • Restore: when it returns from a function call, the manipulation on the variable should reflect the state after the call
Increasing the accuracy (false positive suppression) • Killing variables: once the variable went to the stop state it’s removed from the list p=0, but what about a[i]=0 and i redefined? • Synonyms: p=q=malloc(…) all the operation applies to both synonyms • False path pruning: value tracking + congruence closure algorithm. • Track assignments and comparison • Evaluate the expression on the way • After a loop all loop variables goes to unknown • Same values goes to an equivalence class • Block summary entries removed for the pruned path • Targeted suppression, handled explicitly with metal • History, remember false positives from the past and suppress them (file name, function name, variable name and the actual error, no line number!)
Reporting (ranking) • Approach: first list the errors that are difficult to diagnose with testing • Generic ranking • Distance: between the source and sink • Number of conditionals: 10 line distance each • Aliasing/synonyms • Local errors first • More analysis step, more likely it’s false positiv • High density areas are less important • Z-statistic: count the violations (c) and rule keeping (e), the almost always followed rule is probably correct.
Soundness, Performance • Unsound analysis tool: not all the defects it reports are guaranteed to be genuine, focus on “good” result • Interprocedural algortihm doesn’t follow the recursive loops • Vulnerable to both false positives and false negatives • Uncomplete: doesn’t report all the bugs, you need a checker to detect a bug • Coverty Prevent performance: varies based on the code 15 sec (12,000 lines), MySQL (600,000 lines) < 1 hours. • Coverty Prevent defect density: 478 defects over 1,352,343 lines of code, average defect density of 0.353 defects per 1K line. • Coverty Prevent accuracy: overall false positive rate somewhere between 12.7% and 35.7%, with a mean of 24.2%. Source: Analysis Tool Evaluation: Coverity Prevent; Ali Almossawi, Kelvin Lim, Tanmay Sinha, Carnegie Mellon University (May 1, 2006,)