The SPA Project GOLF and ESP

The SPA ProjectGOLF and ESP Manuvir Das Microsoft Research (joint work with Manuel Fahndrich, Jakob Rehof)

SPA Group Mentor

Software Productivity Tools • Jim Larus runs the group • research.microsoft.com/spt • SLAM, Vault, Behave, PipelineServer … • Focus on software reliability

What’s wrong with analysis? • A: We don’t write or look at real code • B: We don’t solve real problems

Why does this happen? • Analysis is a mix of theory and practice • But • Math and theory are elegant • experimentation needs infrastructure • engineering is boring

Today we’ll talk about … • Doing analysis research the right way • My day job • Slicing and Partial Evaluation • Pointer analysis • Error detection

Slicing and Partial Evaluation • PE: Which computations depend only on known inputs? Do these early. • Or, which computations may depend on unknown inputs? Don’t do these early. • Insight: If a computation depends on unknown input, there must be an unknown input in its slice.

Forward slicing and BTA • Binding-time analysis • identify static computations • BTA via slicing • mark all unknown input nodes • forward slice from marked nodes and mark • all unmarked nodes are static computations

Why is this interesting? • Slicing incorporates control dependence • Previous work used reaching definitions read(y); x = 0; while (y != 0) { y--; x++; } z = x; read(y); x = 0; while (y != 0) { y--; x++; } z = x; read(y); x = 0; while (y != 0) { y--; x++; } z = x; • We can now prove correctness

This project had flaws … • A: We don’t write or look at real code • cubic algorithm, ran on 2k lines in 30 minutes • only one benchmark (ray tracer) • B: We don’t solve real problems • who uses PE in practice? • was the lack of safety critical? • why not use a timer?

Then I visited MSR … • Daniel Weise – 1.5 million lines of real code • Real problems – software reliability • I was hooked! • find buffer overflows using static analysis • oops, need pointer analysis

Papers don’t tell the whole truth! • Implemented Ste96, engineered it • lightning fast, but poor results • Lots of papers on how to improve • structures, signatures, SH97 • Tried it all, nothing worked on real code • Needed Andersen (subtyping) on real code

Frameworks are good • A spectrum from Ste96 to And94 • DGC POPL 98 : unification vs flow • SH POPL 97 : buckets within ECRs • Frameworks • give us a way of tuning precision vs efficiency • help us understand the problem

Frameworks are bad • The real issue: how do you find the best trade-off point in a principled manner? • What if the parameter being varied is not the key concept? • CFA varies control depth rather than data • SH 97 picks random categories • DGC 98 alters the behaviour of the same statement

Back to pointer analysis … • No way to run Andersen on MLOC

So, I hid in my office … • Stared at SPEC code, wrote perl scripts • every feature is used • code is idiomatic • pointers are never assigned, except heap • most pointers arise through parameter passing • some code is just too hard for any analysis • Result: new algorithm driven by real code

FSCS: Flow-sensitive Context-sensitive FICS: Flow-insensitive Context-sensitive FSCI: Flow-sensitive Context-insensitive Precision Cost FICI: Flow-insensitive Context-insensitive Pointer Analysis Landscape

Imprecise Precise Andersen (cubic) Expensive 500 KLOC in several minutes, 2GB Steensgaard (almost linear) Cheap 1.5 MLOC in 1 minute, 100 MB FICI Pointer Analysis One level flow (quadratic)

r1 p q r2 r1 q p r2 r3 Andersen’s Algorithm p = &q; p = q;

s1 r1 p s2 q r2 s3 r1 s1 q p r2 s2 Andersen’s Algorithm p = *q; *p = q;

p q p p q q Steensgaard’s Algorithm p = q;

Motivation for One Level Flow foo(&s1); foo(&s2); bar(&s3); foo(struct s *p) { *p.a = 3; bar(p);} bar(struct s *q) { *q.b = 4;}

p q p q s1 s2 s3 s1,s2,s3 Simplified Example p = &s1; p = &s2; q = &s3; q = p; *p.a = 3; *q.b = 4;

p p q q One Level Flow p = q;

p = &s1; p = &s2; q = &s3; q = p; *p.a = 3; *q.b = 4; p = &s1; p = &s2; q = &s3; q = p; *p.a = 3; *q.b = 4; p = &s1; p = &s2; q = &s3; q = p; *p.a = 3; *q.b = 4; p = &s1; p = &s2; q = &s3; q = p; *p.a = 3; *q.b = 4; p p q q s1 s1 s3 s3 s2 s2 Simplified Example p = &s1; p = &s2; q = &s3; q = p; *p.a = 3; *q.b = 4;

e OLF: Simple Reachability Single query: Linear All queries: Quadratic

x y OLF: Cached Reachability MAX MS Word : From 1 hour to 30 seconds for all queries

Running time (seconds)

Average sizes of points-to sets

This project had flaws too … • B: We don’t solve problems • solved an open problem in pointer analysis • But • never got around to buffer overflow • didn’t use PTA for optimization • addressed these issues later, but • should have been driven by the problem

Since then … • Others have made And94 fast • Heintze PLDI 01 • suggested by OLF results • But what about context-sensitivity? • crucial for value flow analysis • GOLF (DLFR SAS 01) • combines OLF and one level of instantiation constraints (Rehof’s lecture) • context-sensitive value flow on MLOC

OLF: Call Example id(r) {return r;} p = id(&x); q = id(&y); *p = 3; r = &x; p = r; r = &y; q = r; *p = 3;

r p x *r *p y *q q OLF: Call Example r = &x; p = r; r = &y; q = r; *p = 3;

r p x ( ) *r *p y [ *q ] q GOLF: Call Example id(r) {return r;} p = id(&x); q = id(&y); *p = 3;

We have an analysis that is … • fast enough to run on MLOC • good enough for static optimization • who cares; leave it to the chip makers! • not good enough for dynamic optimization (MDCE PASTE 01) • not good enough to track interesting correctness properties in real code

Correctness: the killer app • Hardware can • speed up programs • enforce correctness at run-time • Hardware cannot • enforce correctness before product is shipped • Testers can • find errors on some paths • Testers cannot • find errors on all paths • So, use static analysis to find errors

ESP Vision • Error Detection via Scalable Program Analysis • Must be driven by real code • Must be sound (report all errors) • Must report few false positives • Use knowledge of tradeoffs in analysis • Let user help the analysis

Step 1: Identify the problem • Solve a realistic problem: • partial correctness • user specified, finite-state properties • Solve a non-trivial problem: • don’t check uninits, NULL pointers • check locking protocols, resource usage

INIT(l) Ret Lock(l) Unlock(l) LOCKED(l) Lock(l) Ret ERROR(l) Parameterized Protocol Tracking • User specified • FSM with parameterized actions • patterns • Rest is automatic

Step 2: Examine real code • Find common idioms • Understand level of precision needed • Windows device drivers • mostly control dominated protocols • global data flow needs CS, but not FS/PS • path feasibility seems to matter

Sample driver code STATUS Initialize(Object o) { Object p = o; if (p->needLock) KeAcquireSpinLock(p); p->data = 0; if (p->needLock) KeReleaseSpinLock(p); return OK; }

Step 3: Break up the problem • Three distinct entities to be tracked • the temporal sequence of actions along a particular control flow path • the data involved in the actions • the data involved in path feasibility • Can use different levels of static analysis to track each entity

Data analysis vs control analysis • RHS 95: Cost is Ο(ED3). What is D? • dataflow: D is generally related to program size • program size grows because of pointers, globals • What if there is only a single global FSM? • D is just the #states in the FSM! • Control is cheap, data is expensive

Step 4: Design static analyses • track the temporal sequence of actions along a particular control flow path • cannot use flow-insensitive analysis • RHS95 is too expensive • eliminate the data involved in the actions • use GOLF value flow • now we have a control property, use RHS95 • both analyses are context-sensitive

Data elimination STATUS Initialize(Object o) { Object p = o; if (p->needLock) KeAcquireSpinLock(p); p->data = 0; if (p->needLock) KeReleaseSpinLock(p); return OK; }

I L E Data elimination Initialize() { if (*) Lock; if (*) Unlock; }

Do we need context-sensitivity? • What if GOLF cannot provide MUST info? void Initialize(Object o1, Object o2) { LockWrapper(o1); LockWrapper(o2); KeReleaseSpinLock(o1); KeReleaseSpinLock(o2); } void LockWrapper(Object p) { KeAcquireSpinLock(p); }

Interface nodes • Limit scope of value flow to interface nodes • Produce RHS summaries for interface nodes void LockWrapper(Object p) { KeAcquireSpinLock(p); } p: INIT -> LOCKED, LOCKED -> ERROR • Copy summaries to callers

i o1 p j o2 Back to our example … void Initialize(Object o1, Object o2) { i: LockWrapper(o1); j: LockWrapper(o2); KeReleaseSpinLock(o1); KeReleaseSpinLock(o2); } void LockWrapper(Object p) { KeAcquireSpinLock(p); }

Consider the abstraction! • ESP makes an upfront abstraction • interface nodes in the GOLF graph • Plus: linear size, controls overall cost • Minus: may be too coarse • SLAM allows tuning of abstraction • but now we are back in the framework game

The SPA Project GOLF and ESP

The SPA Project GOLF and ESP

Presentation Transcript

WHAT IS ESP THE DEVELOPMENT OF ESP ESP: APPROACH NOT PRODUCT

ESP

THE CALIFORNIA GOLF CARBON PROJECT

Golf Course Exploratory Project

SPA Rubber Duck Project

Spa Lift Project

Science Spa Project

Golf Ball Paint Project

MEMORY AND ESP

ESP-Net The ESP Company Network

Miniature Golf Project

Miniature Golf Project

The Mini Golf Project

GEF SPA-UNDP Project

Golf and the Economy

ESP and the OLI Engine Updates

Miniature Golf Project

SPA AND THERMAL TREATMENTS IN PRM : THE ITALIAN PROJECT

ACC Austin 2014 CLE/Golf/Spa Event

Book Exciting Golf and Spa Breaks Online

Dinarobin Hotel Golf & Spa

What makes SPA Group the best and their Project SPA Eco City Bangalore Review