1 / 53

ESP: Program Verification Of Millions of Lines of Code

ESP: Program Verification Of Millions of Lines of Code. Manuvir Das Researcher Software Productivity Tools Group Microsoft Corporation PPRC Reliability Team Microsoft Corporation. No Buffer Overruns !. No Resource Leaks !. No Privilege Misuse !. Motivation. Approach.

emma-dorsey
Download Presentation

ESP: Program Verification Of Millions of Lines of Code

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ESP: Program Verification Of Millions of Lines of Code Manuvir Das Researcher Software Productivity Tools Group Microsoft Corporation PPRC Reliability Team Microsoft Corporation

  2. No Buffer Overruns! No Resource Leaks! No Privilege Misuse! Motivation

  3. Approach • Redundency is good • Redundancy exposes inconsistency • Inconsistency points to errors • Compare • what programmer should do • what code actually does

  4. Lightweight specifications (rules) • Describe correct/incorrect behavior • readable/writable by programmers • Specify limited properties • not total correctness/verification • Compare rules against code • tools mechanically and systematically find inconsistencies (errors) • tools can ignore areas of code that are not relevant to the rule being checked

  5. Defects 100% path coverage Rule-basedprogramming Rules Development Testing Static Verification Tool Read for understanding Drive testing tools Precise Rules New API rules Program Analysis Engine Source Code

  6. Types are the onlysuccessful specifications • Programmers routinely read and write type declarations • document interface syntax • basis for program abstractions • static (compile-time) error detection • Type checking is fast & routine • finds errors

  7. Can we extend this approach? • Specify and check other properties • languages to express rules • tools to check that code obeys rules • Goal is partial correctness • detect and report important classes of errors • no guarantee of program correctness • Systematic • sound, complete analysis • flag absence of errors • false positives OK

  8. No Buffer Overruns! No Resource Leaks! No Privilege Misuse! Motivation

  9. Requirements • Scalability • Complete coverage • Millions of lines of code • All features of C/C++ • Usability • Low number of false positives • Simple rule description language • Informative error reports

  10. Defects 100% path coverage ESP Rules Development Testing ESP Read for understanding Drive testing tools OPAL Rules New API rules Path-sensitive Dataflow Analysis C/C++ Code

  11. Defects 100% path coverage ESP Rules ESP OPAL Rules Path-sensitive Dataflow Analysis C/C++ Code

  12. The bottom line • Can ESP verify a million lines of code? • We’re not sure …. yet • We’ve done 150 KLOC in 70s and 50MB • So, we’re cautiously optimistic

  13. Are we running into a wall? • Verification demands precision • Need to avoid false error reports • Must analyze each execution path • Big programs demand scalability • Exponentially/infinitely many paths • Cannot analyze each execution path • Must use approximate analysis

  14. Path-sensitive analysis Type inference Dataflow analysis Precision vs scalability ? Precision Scalability

  15. Precision vs scalability Precision Scalability

  16. Research problem • Can we invent a verification method that • is always conservative, • is always scalable, • is almost always precise, and • matches our intuition? • Yes, for a certain class of rules • Finite state, temporal safety properties

  17. Finite state safety properties • Property is described by an FSA • As the program executes, a monitor • tracks the current state of the FSA • updates the current state • signals an error when the FSA transitions into special error states • Goal of verification: • Is there some execution path that would cause the monitor to signal an error?

  18. Closed Print/Close * Error Open Close Open Opened Print Example: stdio usage in gcc void main () { if (dump) Open; if (p) x = 0; else x = 1; if (dump) Close; } void main () { if (dump) fil = fopen(dumpFile,”w”); if (p) x = 0; else x = 1; if (dump) fclose(fil); }

  19. Path-sensitive property analysis • Symbolically evaluate the program • Track FSA state and execution state • At branch points: • Execution state implies branch direction? • Yes: process appropriate branch • No: split state and process both branches

  20. entry dump T F Open p F T x = 0 x = 1 dump T F Close exit Example [Closed] [Closed|dump=T] [Opened|dump=T] [Opened|dump=T,p=T] [Opened|dump=T,p=F] [Opened|dump=T,p=T,x=0] [Opened|dump=T,p=F,x=1] [Opened|dump=T,p=T,x=0] [Opened|dump=T,p=F,x=1] [Closed|dump=T,p=T,x=0] [Closed|dump=T,p=F,x=1]

  21. Dataflow property analysis • Track only FSA state • Ignore non-state-changing code • At control flow join points: • Accumulate FSA states

  22. entry dump T F Open p F T x = 0 x = 1 dump T F Close exit Example {Closed} {Closed,Opened} {Closed,Opened} {Error,Closed,Opened}

  23. Closed Print/Close * Error Open Close Open Opened Print Why is this code correct? void main () { if (dump) Open; if (p) x = 0; else x = 1; if (dump) Close; }

  24. When is a branch relevant? • Precise answer • When the value of the branch condition determines the property FSA state • Heuristic answer • When the property FSA is driven to different states along the arms of the branch statement

  25. Property simulation • Modification of path-sensitive analysis • At control flow join points: • States agree on property FSA state? • Yes: merge states • No: process states separately

  26. entry dump T F Open p F T x = 0 x = 1 dump T [Opened|dump=T,p=T,x=0] [Opened|dump=T,p=F,x=1] F Close exit Example [Closed] [Opened|dump=T] [Closed|dump=F] [Opened|dump=T] [Closed|dump=F] [Closed|dump=T][Closed|dump=F] [Closed|dump=T] [Closed]

  27. Loop example entry [Closed] new = old [Closed|new=old+1] Open [Opened|new=old] * T T Close F new++ new != old [Opened|new=old] [Closed|new=old+1] F Close exit [Closed|new=old]

  28. Making property simulation work • Real programs are complex • Multiple FSAs • Aliasing • Real code bases are very large • Well beyond a million lines • ESP = Property Simulation + Multiple FSAs + Aliasing + Component-wise Analysis

  29. void main () { if (dump1) fil1 = fopen(dumpFile1,”w”); if (dump2) fil2 = fopen(dumpFile2,”w”); if (dump1) fclose(fil1); if (dump2) fclose(fil2); } void main () { if (dump1) Open(fil1); if (dump2) Open(fil2); if (dump1) Close(fil1); if (dump2) Close(fil2); } Closed Print/Close * Error Source code pattern Sourcecodepattern Transition Transition Open Close Open e = fopen(_) e = fopen(_) Open Open Opened Print fclose(e) fclose(e) Close Close Problem: Multiple FSAs void main () { if (dump1) fil1 = fopen(dumpFile1,”w”); if (dump2) fil2 = fopen(dumpFile2,”w”); if (dump1) fclose(fil1); if (dump2) fclose(fil2); }

  30. Property simulation, bit by bit • Problem: property state can be exponential • Solution: track one FSA at a time void main () { if (dump1) Open; if (dump2) ID; if (dump1) Close; if (dump2) ID; } void main () { if (dump1) ID; if (dump2) Open; if (dump1) ID; if (dump2) Close; }

  31. Property simulation, bit by bit • One FSA at a time + Avoids exponential property state + Fewer branches are relevant + Lifetimes are often short + Smaller memory footprint + Embarassingly parallel − Cannot correlate FSAs

  32. Problem: Aliasing void main () { if (dump1) fil1 = fopen(dumpFile1,”w”); if (dump2) fil2 = fopen(dumpFile2,”w”); fil3 = fil1; if (dump1) fclose( fil3 ); if (dump2) fclose( fil2 ); }

  33. ESP Model: Values Have State • During execution, the program • creates stateful values • changes the state of stateful values • The programmer defines • how values are created (syntactic patterns) • how values change state (syntactic patterns) • Syntactic expressions are aliases for values

  34. OPAL Rule Descriptions • Object Property Automata Language State Closed State Opened State Error Initial Event Open { _object_ ASTFUNCTIONCALL { ASTSYMBOL “fopen” } { _anyargs_ } } Event Close { ASTFUNCTIONCALL { ASTSYMBOL “fclose” } { _object_ } } Transition _ -> Opened on Open Transition Opened -> Closed on Close Transition Closed -> Error on Close “File already closed”

  35. Parameterized transitions void main () { if (dump1) fil1 = fopen(dumpFile1,”w”); if (dump2) fil2 = fopen(dumpFile2,”w”); fil3 = fil1; if (dump1) fclose( fil3 ); if (dump2) fclose( fil2 ); }

  36. Parameterized transitions void main () { if (dump1) { t1 = fopen(dumpFile1,”w”); Open(t1); fil1 = t1; } if (dump2) { t2 = fopen(dumpFile2,”w”); Open(t2); fil2 = t2; } fil3 = fil1; if (dump1) { fclose( fil3 ); Close(fil3); } if (dump2) { fclose( fil2 ); Close(fil2); } }

  37. Expressions are value aliases void main () { if (dump1) { t1 = fopen(dumpFile1,”w”); Open(t1); fil1 = t1; } if (dump2) { t2 = fopen(dumpFile2,”w”); Open(t2); fil2 = t2; } fil3 = fil1; if (dump1) { fclose( fil3 ); Close(fil3); } if (dump2) { fclose( fil2 ); Close(fil2); } }

  38. Value-alias analysis • Is expression e an alias for value v? • ESP uses GOLF to answer this query • Generalized One Level Flow • Context-sensitive • Largely flow-insensitive • Millions of lines of code, in seconds

  39. Putting it all together • Property simulation • Identify and track relevant execution state • Syntactic patterns + value-alias analysis • Identify and isolate individual FSAs • One FSA at a time • Bit vector analysis for safety properties

  40. ESP block diagram

  41. Case study: stdio usage in gcc • cc1 from gcc version 2.5.3 (Spec95) • Does cc1 always print to opened files? • cc1 is a complex program: • 140K non-blank, non-comment lines of C • 2149 functions, 66 files, 1086 globals • Call graph includes one 450 function SCC

  42. Skeleton of cc1 source FILE *f1, … , *f15; int p1, … , p15; void compileFile() { if (p1) f1 = fopen(…); … if (p15) f15 = fopen(…); restOfComp(); if (p1) fclose(f1); … if (p15) fclose(f15); } void restOfComp() { if (p1) printRtl(f1); … if (p15) printRtl(f15); restOfComp(); } void printRtl(FILE *f) { fprintf(f); }

  43. OPAL rules for stdio usage State Uninit State Closed State Opened State Error Initial Event Decl {ASTDECLARATION {_object_ ASTSYMBOL _any_}} Initial Event Open {_object_ ASTFUNCTIONCALL {ASTSYMBOL “fopen”} {_anyargs_}} Event Print {ASTFUNCTIONCALL {ASTSYMBOL “fprintf”} {_object_,_anyargs_}} Event Close {ASTFUNCTIONCALL {ASTSYMBOL “fclose”} {_object_}} Transition _ -> Uninit on Decl Transition _ -> Opened on Open Transition Uninit -> Error on Print “File not opened” Transition Opened -> Opened on Print Transition Closed -> Error on Print “Printing to closed file” Transition Opened -> Closed on Close Transition Closed -> Error on Close “File already closed”

  44. Experimental results • Precision • Verification succeeds for every file handle • No transitions to Error; no false errors • Scalability • Ave. per handle: 72.9 seconds, 49.7 MB • Single 1GHz PIII laptop with 512 MB RAM • We have proved that: • Each of the 646 calls to fprintf in the source code prints to a valid, open file

  45. Ongoing research • Better property simulation • More liberal merge criteria • More precise tracking of symbolic states • Better value-alias analysis • Track value-aliases during simulation • Add value-alias sets to property state • Component-wise analysis • Identify and analyze components • Link using less precise analysis

  46. Scalable and precise analysis

  47. Collaborators • Group Members • Stephen Adams, Muthu Jagannathan • Advisors • Manuel Fahndrich, Jakob Rehof • Summer Interns (2001) • Sorin Lerner, Univ. of Washington • Mark Seigle, Univ. of Washington • Summer Interns (2002) • Hao Chen, UC Berkeley • Nurit Dor, Tel Aviv Univ. • Seth Hallem, Stanford Univ.

  48. Related work • Verification tools • SLAM, Blast, CQual • Bug-finding tools • xgcc, PREfix, PREfast, LCLint • Language approaches • ESC-Java, Vault, CCured, Roles • Rule generators • Engler et. al., Ammons et. al., Chen et. al. • Typestate

  49. What is different in ESP? • Property simulation • Syntactic patterns + value-alias analysis • One FSA at a time • Scale, scale, scale ….

More Related