410 likes | 571 Views
Constraint-Based Analysis. Lecture 16 CS 6340 (adapted from a lecture by Alex Aiken). unlock. lock. unlock. Error. Unlocked. Locked. lock. Code Example. Flow Sensitivity. void f(state *x, state *y) { result = spin_trylock( & x->lock); spin_lock( & y->lock); …
E N D
Constraint-Based Analysis Lecture 16CS 6340 (adapted from a lecture by Alex Aiken)
unlock lock unlock Error Unlocked Locked lock Code Example Flow Sensitivity void f(state *x, state *y) { result = spin_trylock(&x->lock); spin_lock(&y->lock); … if (!result) spin_unlock(&x->lock); spin_unlock(&y->lock); } result (&x->lock); spin_trylock (&y->lock); spin_lock Path Sensitivity (!result) Pointers & Heap (&x->lock); (&y->lock); spin_unlock Inter-procedural
Saturn • What? • SAT-based approach to static bug detection • How? • SAT-based approach • Program constructs Boolean constraints • Inference SAT solving • Why SAT? • Lots of reasons, but for now: • Program states naturally expressed as bits • The theory for bits is SAT • Efficient solvers widely available
Intuition • Analyzing in one direction is problematic • Forwards or backwards • Consider null dereference analysis • No null ptr assignments: forwards is best • No dereferences: backwards is best • Constraints • Give a global picture of the program • Allow more efficient order of solution
x31 … x0 y31 … y0 Bitwise-AND x31y31 … x0y0 == Straight-line Code void f(int x, int y) { int z = x & y ; assert(z == x); } ; z y x & == R
Straight-line Code void f(int x, int y) { int z = x & y; assert(z == x); } Query: Is-Satisfiable( ) Answer: Yes x = [00…1] y = [00…0] Negated assertion is satisfiable. Therefore, the assertion may fail. R
Control Flow – Preparation • Approach • Assumes loop free program • Unroll loops, drop backedges • May miss errors that are deeply buried • Bug finding, not verification • Many errors surface in a few iterations • Advantages • Simplicity, reduces false positives
Control Flow – Example • if (c) • x = a; • else • x = b; • res = x; • Merges • preserve path sensitivity • select bits based on the values of incoming guards G = c, x: [a31…a0] G = c, x: [b31…b0] G = cc, x: [v31…v0] where vi = (cai)(cbi) if (c) c c x = a; x = b; true res = x;
Pointers – Overview • May point to different locations… • Thus, use points-to sets p: { l1,…,ln } • … but path sensitive • Use guards on points-to relationships p: { (g1, l1), …, (gn, ln) }
Pointers – Example G = true, p: { (true, x) } • p = &x; • if (c) • p = &y; • res = *p; if (c) res = y; else if (c) res = x; G = c, p: { (true, y) } G = true, p: { (c, y); (c, x)}
Pointers – Recap • Guarded Location Sets { (g1, l1), …, (gn, ln) } • Guards • Condition under which points-to relationship holds • Collected from statement guards • Pointer Dereference • Conditional Assignments
Not Covered • Other Constructs • Structs, … • Modeling of the environment • Optimizations • several to reduce size of formulas • some form of program slicing important
if (l->state == Unlocked) l->state = Locked; else l->state = Error; unlock if (l->state == Locked) l->state = Unlocked; else l->state = Error; lock unlock Error Locked Unlocked lock What can we do with Saturn? int f(lock_t *l) { lock(l); … unlock(l); }
General FSM Checking • Encode FSM in the program • State Integer • Transition Conditional Assignments • Check code behavior • SAT queries
How are we doing so far? • Precision: • Scalability: • SAT limit is 1M clauses • About 10 functions • Solution: • Divide and conquer • Function summaries
Function behavior can be summarized with a set of state transitions Summary: *l: Unlocked Unlocked Locked Error int f(lock_t *l) { lock(l); … … unlock(l); return 0; } Function Summaries (1st try)
int f(lock_t *l) { lock(l); … if (err) return -1; … unlock(l); return 0; } Problem two possible output states distinguished by return value (retval == 0)… Summary 1. (retval == 0) *l: Unlocked Unlocked Locked Error 2. (retval == 0) *l: Unlocked Locked Locked Error A Difficulty
FSM Function Summaries • Summary representation (simplified): { Pin, Pout, R } • User gives: • Pin: predicates on initial state • Pout: predicates on final state • Express interprocedural path sensitivity • Saturn computes: • R: guarded state transitions • Used to simulate function behavior at call site
int f(lock_t *l) { lock(l); … if (err) return -1; … unlock(l); return 0; } Output predicate: Pout = { (retval == 0) } Summary (R): 1. (retval == 0) *l: Unlocked Unlocked Locked Error 2. (retval == 0) *l: Unlocked Locked Locked Error Lock Summary (2nd try)
Lock checker for Linux • Parameters: • States: { Locked, Unlocked, Error } • Pin = {} • Pout = { (retval == 0) } • Experiment: • Linux Kernel 2.6.5: 4.8MLOC • ~40 lock/unlock/trylock primitives • 20 hours to analyze • 3.0GHz Pentium IV, 1GB memory
Double Locking/Unlocking static void sscape_coproc_close(…) { spin_lock_irqsave(&devc->lock, flags); if (…) sscape_write(devc, DMAA_REG, 0x20); … } static void sscape_write(struct … *devc, …) { spin_lock_irqsave(&devc->lock, flags); … }
Ambiguous Return State int i2o_claim_device(…) { down(&i2o_configuration_lock); if (d->owner) { up(&i2o_configuration_lock); return –EBUSY; } if (…) { return –EBUSY; } … }
Bugs Previous Work: MC (31), CQual (18), <20% Bugs
Function Summary Database • 63,000 functions in Linux • More than 23,000 are lock related • 17,000 with locking constraints on entry • Around 9,000 affects more than one lock • 193 lock wrappers • 375 unlock wrappers • 36 with return value/lock state correlation • Available on the web . . .
Another Checker • Memory leaks • Common, esp. in error handling code • Hard to find • Problematic in long running applications • Current techniques • Escape analysis • Ownership types • Region based analysis…
Simple Leak char *f() { char *p; p = (char*)malloc(…); … if (err) return NULL; … return p; }
Scenario 1 – Malloc Wrappers char *f() { char *p; p = (char*)strdup(…); … if (err) return NULL; … return p; }
Scenario 2 – External References char *f(struct *s) { char *p; p = (char*)malloc(…); s->name = p; if (err) return NULL; … return p; }
Scenario 3 – Function Calls char *f(struct state *s) { char *p; p = (char*)malloc(…); g(s, p); if (err) return NULL; … return p; } void g(s, p) { s->name = p;}
Scenario 4 – Data dependency void f(int len) { char fastbuf[10], *p; if (len < 10) p = fastbuf; else p = (char *)malloc(len); … if (p != fastbuf) free(p); }
Requirements • Track points-to relationships precisely • Infer escaping functions • ones that create external references to objects passed in via parameters • Infer allocation functions
Analysis Part I – Points-to Rule • PointsTo(p, l) • condition under which p points to l (p) = { (g0, l0), …, (gn-1, ln-1) } PointsTo(p, l) = gi (if li = l) false (otherwise)
Analysis PartII – EscapeVia • EscapeVia(l, p, X) • the condition under which location l escapes via pointer p, excluding references in set X • Access Roots • Every object in the function body is accessed through one of the following “roots” • Parameters (p1…pn) • The Return Value (ret_val) • Global Variables • Local Variables • Heap Allocated Objects
Analysis Part II – EscapeVia • Never escape through local variables Root(p) Locals X EscapeVia(l, p, X) = false • Always escape through global variables RootOf(p) Globals EscapeVia(l, p, X) = PointsTo(p, l)
Analysis Part II – EscapeVia • Escaping through parameters/return RootOf(p) (Params { ret_val }) – X EscapeVia(l, p, X) = PointsTo(p, l) • Escaping via another allocated location RootOf(p) NewLocs – X EscapeVia(l, p, X) = PointsTo(p, l) Escaped(p,X {RootOf(l)})
Analysis Part III – Escape/Leak • Escape ConditionEscaped(l, X) = p EscapedVia(l, p, X) • Leak ConditionLeaked(l, X) = Escaped(l, X) • Leak CheckerFor all new locations l, there is a leak ifSatisfiable(Leaked(l, {}))
The Leak Checker • For all new location l, there is a leak if Satisfiable(Leaked(l, {}))
Why SAT? (Revisited …) • Moore’s Law • Uniform modeling of constructs as bits • Constraints • Local specification • Global solution • Incremental SAT solving • makes multiple queries efficient
Why SAT? (Cont.) • Path sensitivity is important • To find bugs • To reduce false positives • Much easier to model precisely with SAT • Compositionality is important • Function summaries critical for scalability • Easy to construct with SAT queries