120 likes | 284 Views
Finding the Weakest Characterization of Erroneous Inputs. Dzintars Avots and Benjamin Livshits. The Art of Hiding Your Sources. Our approach: fleece as many papers as possible You will most likely find similarities with: Korat: Automated Testing Based on Java Predicates
E N D
Finding the Weakest Characterization of Erroneous Inputs Dzintars Avots and Benjamin Livshits
The Art of Hiding Your Sources • Our approach: fleece as many papers as possible • You will most likely find similarities with: • Korat: Automated Testing Based on Java Predicates • Automatic Predicate Abstraction of C Programs • From Symptom to Cause: Localizing Errors in Counterexample Traces • Parametric shape analysis via 3-valued logic • Weakest precondition reasoning, etc.
Problem Statement • A lot of static tools produce error traces • Metal • Intrinsa • Others • However, testing for false negatives in error traces is often hard • Why? • Need to determine if the error trace is feasible • How to trigger that particular path? • What conditions on the input and environment need to hold?
More Concrete Examples • Comes from (real) research motivation • Buffer overruns (last year’s FSE) • A buffer overrun is a “tainted” user value copied to a statically sized buffer • Generated buffer overruns across many procedure invocations • How to test if it may actually be exploitable? • Fault injection in Java (current research) • Introduce “bad” values into the system • Start with HttpRequest • Populate its fields • Push the request through the system • See if we get an exception thrown
Exploring Possibilities • Assume: varying the input influences the outcome • Input: • string buffers • elements of a Java structures • Korat: • try “small” inputs and see what happens • Want: • weakest condition on the input that always causes a failure
stdin u3 u2 u1 Stores describe program input • Properties: Int_val(u1) > 0, char_val(u2) >0, char_val(u3)=0 • Edges: “is followed by” • Represents: 5“abcde\0”, 1“x\0”, etc. • Current stream position also represented
Imitating Pred Abstraction • Define predicate update formula using predicates satisfying weakest precondition • pred’ = WP(pred) ¬WP(¬pred)1/2 • Enforce construct is taken care of by TVLA coerce optimization
Problems • Length properties • How to compare lengths of summarized lists with iterator position • Deriving input shape • Input store properties are initially unknown • Reads “create” or reuse input nodes • Branch conditions assert properties of input shape – which isn’t that interesting if “unknown”
Where do we need precision? • Local pointer relations (same as before) • Current stream position • Relevant branch condition predicates • y is relevant, x is not ? • What if (¬x,y) and (x,¬y) are both infeasible? If (x) { if (y) …; FAIL(); else …; } else { if (y) …; FAIL(); else …; }
Classifying Predicates • Classify of all paths through program: • Erroneous “evil” paths • Good paths • Classify all predicates in the program: • P1 : Located on erroneous paths only • P0 : Located on good paths only • P1/2: Located on both types of paths • (most fall in the last category)
Iteratively Run TVLA I = P0 P1; // set of instrumentation predicates do { 1. use I as instrumentation predicates 2. run TVLA on the program 3. add input TVLA structures leading to error to S 4. include more predicates into I if have ½ values } while ( I changes && not tired yet ) ; // simplify structures leading to error w = empty foreach (configuration c in S){ OR c with w // w is the weakest input leading to error }
Bottom Line • Identify weakest input w leading to errors • TVLA provides a sound proof that it will always lead to an error • Have a choice of which predicates to add to I next, can try heuristics • Get a qualitatively much stronger answer that Korat