Finding the Weakest Characterization of Erroneous Inputs

Finding the Weakest Characterization of Erroneous Inputs Dzintars Avots and Benjamin Livshits

The Art of Hiding Your Sources • Our approach: fleece as many papers as possible • You will most likely find similarities with: • Korat: Automated Testing Based on Java Predicates • Automatic Predicate Abstraction of C Programs • From Symptom to Cause: Localizing Errors in Counterexample Traces • Parametric shape analysis via 3-valued logic • Weakest precondition reasoning, etc.

Problem Statement • A lot of static tools produce error traces • Metal • Intrinsa • Others • However, testing for false negatives in error traces is often hard • Why? • Need to determine if the error trace is feasible • How to trigger that particular path? • What conditions on the input and environment need to hold?

More Concrete Examples • Comes from (real) research motivation • Buffer overruns (last year’s FSE) • A buffer overrun is a “tainted” user value copied to a statically sized buffer • Generated buffer overruns across many procedure invocations • How to test if it may actually be exploitable? • Fault injection in Java (current research) • Introduce “bad” values into the system • Start with HttpRequest • Populate its fields • Push the request through the system • See if we get an exception thrown

Exploring Possibilities • Assume: varying the input influences the outcome • Input: • string buffers • elements of a Java structures • Korat: • try “small” inputs and see what happens • Want: • weakest condition on the input that always causes a failure

stdin u3 u2 u1 Stores describe program input • Properties: Int_val(u1) > 0, char_val(u2) >0, char_val(u3)=0 • Edges: “is followed by” • Represents: 5“abcde\0”, 1“x\0”, etc. • Current stream position also represented

Imitating Pred Abstraction • Define predicate update formula using predicates satisfying weakest precondition • pred’ = WP(pred) ¬WP(¬pred)1/2 • Enforce construct is taken care of by TVLA coerce optimization

Problems • Length properties • How to compare lengths of summarized lists with iterator position • Deriving input shape • Input store properties are initially unknown • Reads “create” or reuse input nodes • Branch conditions assert properties of input shape – which isn’t that interesting if “unknown”

Where do we need precision? • Local pointer relations (same as before) • Current stream position • Relevant branch condition predicates • y is relevant, x is not ? • What if (¬x,y) and (x,¬y) are both infeasible? If (x) { if (y) …; FAIL(); else …; } else { if (y) …; FAIL(); else …; }

Classifying Predicates • Classify of all paths through program: • Erroneous “evil” paths • Good paths • Classify all predicates in the program: • P1 : Located on erroneous paths only • P0 : Located on good paths only • P1/2: Located on both types of paths • (most fall in the last category)

Iteratively Run TVLA I = P0 P1; // set of instrumentation predicates do { 1. use I as instrumentation predicates 2. run TVLA on the program 3. add input TVLA structures leading to error to S 4. include more predicates into I if have ½ values } while ( I changes && not tired yet ) ; // simplify structures leading to error w = empty foreach (configuration c in S){ OR c with w // w is the weakest input leading to error }

Bottom Line • Identify weakest input w leading to errors • TVLA provides a sound proof that it will always lead to an error • Have a choice of which predicates to add to I next, can try heuristics • Get a qualitatively much stronger answer that Korat

Finding the Weakest Characterization of Erroneous Inputs

Finding the Weakest Characterization of Erroneous Inputs

Presentation Transcript

Inputs

Inputs

Erroneous Payment of Living Quarters Allowance

Inputs

INPUTS

Inputs

Inputs

The weakest link

INPUTS

Inputs

Inputs

inputs

Transform the Weakest Link

Inputs

Inputs

Inputs

Are you the Weakest Link?

Inputs

Teeth: The Strongest And The Weakest

Inputs

Weakest Precondition of Unstructured Programs