180 likes | 298 Views
Dynamically Discovering Likely Program Invariants to Support Program Evolution. Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented by: Nick Rutar. Program Invariants. Useful in software development Protect programmers from making errant changes
E N D
Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented by: Nick Rutar
Program Invariants • Useful in software development • Protect programmers from making errant changes • Verify properties of a program • Can be explicitly stated in programs • Programmers can annotate code with invariants • This can take time and effort • Many important invariants will be missed
Daikon - Dynamic Invariant Detector • Dynamic -- From Program Executions • Step 1: Instrument Source Program • Trace Variables of Interest • Step 2: Run Instrumented Program Over Test Suite • Step 3: Infer Invariants from • Instrumented Variables • Derived Variables
i = 0; s = 0; do i ≠ n i = i + 1 s = s + b[i] Precondition: n ≥ 0 Postcondition: s = ( j : 0 ≤ j < n : b[j]) Loop Invariant: 0 ≤ i ≤ n and s = ( j : 0 ≤ j < i : b[j]) Example Program(taken from “The Science of Programming”)
ENTER N = size(B) N in [7 … 13] B - All elements ≥ -100 EXIT N = I = orig(N) = size(B) B = orig(B) S = sum(B) N in [7 … 13] B - All elements ≥ -100 LOOP N = size(B) S = sum(B[0 … I -1]) N in [7 … 13] I in [0 … 13] I ≤ N B - all elements in [-100.100] sum(B) in [-556.539] B[0] nonzero in [-99.96] B[-1] in [-88.99] N != B[-1] (negative) B[0] != B[-1] (negative) Daikon results from the program(100 randomly generated input arrays of length 7-13)
Instrumentation • Insert instrumentation points • Procedure Entry • Procedure Exit • Loop Heads • Writes to a file values for • All variables in scope • Global Variables • Procedure arguments • Local Variables • Procedure’s return value • Available for Platforms • LISP • C/C++ • Java (from Daikon website) • Eclipe plug-in available • Perl (from Daikon website)
Inferring invariants • System checks for the following (x,y,z variables; a,b,c computed constants): • Any variable • constant or small number of values • Numeric variable • range (a ≤ x ≤ b) • modulus & nonmodulus • Multiple numbers • linear relationship (such as x = ay + bz + c) • functions (all those in standard lib, e.g. x = abs(y)) • comparisons (x < y, x ≥ y, x == y) • invariants over x + y and x -y • Sequence: • sortedness • invariants over all elements (e.g., every element < 100) • Multiple sequences • subsequence & lexicographic relationship • Sequence and scalar • membership
Inferring invariants (continued) • Each potential variant is tested • When invariant doesn’t hold, not tested again • Negative Invariants • Relationships that are expected but don’t occur from input • Probability limit decides if invariants are included • Derived Variables • Expressions treated same as regular variables • Include: • From any array: first and last elements, length • From numeric array: sum, min, max • From array and scalar: element at that index(a[i]), subarray up to, and subarray beyond, that index • From function invocation: number of calls so far
Using Invariants • Modified Siemens replace (~500 LOC) program • Takes in regular expression and replacement string as input • Copies input stream to output stream replacing matched strings • Added input pattern <pat>+ to <pat><pat>* • Use invariants for glimpse on how program runs • Found occurrences where initial belief was contradicted • Prevented introducing bugs based on flawed knowledge of code • Found instance of unreported array bounds error
Using invariants (continued) • Everything learned from “replace” could have been learned by combination of • Reading the code • Static Analyses • Selected Program Instrumentation • Invariants give benefits that other approaches do not • Inferred invariants are abstraction of larger amount of data • Flags raised with unexpected invariants or expected invariants not appearing • Queries against database build intuition about source of invariant • Inferred invariants provide basis for programmer inferences • Invariants provide beneficial degree of serendipity
Results - Time • Ran tests with between 500-3000 test inputs for replace • Inferred ~71 variables per inst point in replace • 6 original, 65 derived, 52 scalars, 19 sequences • On average, 10 derived for every original • 1000 test cases • Produce 10,120 samples per instrumentation point • System takes 220 seconds to infer invariants • 3000 test cases • 33,801 samples • Processing takes 540 seconds • Invariant detection time grows quadratically with the number of variables over which invariants are checked • Time grows linearly with test suite size
Invariant Stability • Relationship between test size suite and invariants • Across test suites • Identical - invariant same between two test suites • Missing - invariant is present in one test suite, but not other • Different - invariant is different between two test suites • Interesting - Worthy of further study to determine relevance • Uninteresting - Peculiarity in the data • S1 in [ 0 … 98 ] (99 values) • S1 >= 0 (96 values)
Invariants and Program Correctness • Compare invariants detected across programs • Correct versions of programs have more invariants than incorrect ones • Examination of 424 intro C programs from U of Washington • Given # of students, amount of money, # of pizzas, calculates whether the students can afford the pizzas. • Chose eight relevant invariants • people – [1…50] • pizzas – [1…10] • pizza_price – {9,11} • excess_money – [0...40] • slices = 8 * pizza • slices = 0 (mod 8) • slices_per – {0,1,2,3} • slices_left people - 1
Relationship of Grade and Goal Invariants Invariants Detected
Future Work (from 2001 paper) • Increasing Relevance • Invariant is relevant if it assists programmer • Repress invariants logically implied by others • Viewing and Managing Invariants • Overwhelming for a programmer to sort through • Various tools for selective reporting of invariants • Improving Performance • Balance between invariant quality and runtime • Number of Derived Variables used • Richer Invariants • Invariants over Pointer based data structures • Computing Conditional Invariants
Resources • Daikon website • http://pag.csail.mit.edu/daikon/ • Contains links to • Papers • Source Code • User Manual • Developers Manual