520 likes | 717 Views
Tolerating and Correcting Memory Errors in C and C++. Ben Zorn Microsoft Research In collaboration with: Emery Berger and Gene Novark, Umass - Amherst Karthik Pattabiraman, UIUC Vinod Grover and Ted Hart, Microsoft Research. Focus on Heap Memory Errors.
E N D
Tolerating and Correcting Memory Errors in C and C++ Ben ZornMicrosoft ResearchIn collaboration with: Emery Berger and Gene Novark, Umass - Amherst Karthik Pattabiraman, UIUC Vinod Grover and Ted Hart, Microsoft Research Tolerating and Correcting Memory Errors in C and C++
Focus on Heap Memory Errors • Buffer overflowchar *c = malloc(100);c[101] = ‘a’; • Dangling referencechar *p1 = malloc(100);char *p2 = p1;free(p1);p2[0] = ‘x’; c a 0 99 p1 p2 x 0 99 Tolerating and Correcting Memory Errors in C and C++
Approaches to Memory Corruptions • Rewrite in a safe language • Static analysis / safe subset of C or C++ • SAFECode [Adve], etc. • Runtime detection, fail fast • Jones & Lin, CRED [Lam], CCured [Necula], etc. • Debugging possible, security advantage? • Runtime toleration • Failure oblivious [Rinard] (unsound) • Rx, Boundless Memory Blocks, ECC memoryDieHard / Exterminator, Samurai Tolerating and Correcting Memory Errors in C and C++
Fault Tolerance and Platforms • Platforms necessary in computing ecosystem • Extensible frameworks provide lattice for 3rd parties • Tremendously successful business model • Examples numerous: Window, iPod, browser, etc. • Platform power derives from extensibility • Tension between isolation for fault tolerance, integration for functionality • Platform only as reliable as weakest plug-in • Tolerating bad plug-ins necessary by design Tolerating and Correcting Memory Errors in C and C++
Research Vision • Increase robustness of installed code base • Potentially improve millions of lines of code • Minimize effort – ideally no source mods, no recompilation • Reduce requirement to patch • Patches are expensive (detect, write, deploy) • Patches may introduce new errors • Enable trading resources for robustness • E.g., more memory implies higher reliability Tolerating and Correcting Memory Errors in C and C++
Outline • Motivation • Exterminator • Collaboration with Emery Berger, Gene Novark • Automatically corrects memory errors • Suitable for large scale deployment • Critical Memory / Samurai • Collaboration with Karthik Pattabiraman, Vinod Grover • New memory semantics • Source changes to explicitly identify and protect critical data • Conclusion Tolerating and Correcting Memory Errors in C and C++
DieHard Allocator in a Nutshell • Joint work with Emery Berger • Existing heaps are packed tightly to minimize space • Tight packing increases likelihood of corruption • Predictable layout is easier for attacker to exploit • We randomize and overprovision the heap • Expansion factor determines how much empty space • Semantics are identical • Replication increases benefits • Enables analytic reasoning Normal Heap DieHard Heap Tolerating and Correcting Memory Errors in C and C++
DieHard in Practice • DieHard (non-replicated) • Windows, Linux version implemented by Emery Berger • Try it right now! (http://www.diehard-software.org/) • Adaptive, automatically sizes heap • Mechanism automatically redirects malloc calls to DieHard DLL • Application: Firefox & Mozilla • Known buffer in version 1.7.3 overflow crashes browser • Experience • Usable in practice – no perceived slowdown • Roughly doubles memory consumption • 20.3 Mbytes vs. 44.3 Mbytes with DieHard Tolerating and Correcting Memory Errors in C and C++
Caveats • Primary focus is on protecting heap • Techniques applicable to stack data, but requires recompilation and format changes • DieHard trades space, extra processors for memory safety • Not applicable to applications with large footprint • Applicability to server apps likely to increase • DieHard requires non-deterministic behavior to be made deterministic (on input, gettimeofday(), etc.) • DieHard is a brute force approach • Improvements possible (efficiency, safety, coverage, etc.) Tolerating and Correcting Memory Errors in C and C++
Exterminator Motivation • DieHard limitations • Tolerates errors probabilistically, doesn’t fix them • Memory and CPU overhead • Provides no information about source of errors • “Ideal” solution addresses the limitations • Program automatically detects and fixes memory errors • Corrected program has no memory, CPU overhead • Sources of errors are pinpointed, easier for human to fix • Exterminator = correcting allocator • Joint work with Emery Berger, Gene Novark • Random allocation => isolates bugs while tolerating them Tolerating and Correcting Memory Errors in C and C++
Exterminator Components • Architecture of Exterminator dictated by solving specific problems • How to detect heap corruptions effectively? • DieFast allocator • How to isolate the cause of a heap corruption precisely? • Heap differencing algorithms • How to automatically fix buggy C code without breaking it? • Correcting allocator + hot allocator patches Tolerating and Correcting Memory Errors in C and C++
DieFast Allocator • Randomized, over-provisioned heap • Canary = random bit pattern fixed at startup • Leverage extra free space by inserting canaries • Inserting canaries • Initialization – all cells have canaries • On allocation – no new canaries • On free – put canary in the freed object with prob. P • Remember where canaries are (bitmap) • Checking canaries • On allocation – check cell returned • On free – check adjacent cells 100101011110 Tolerating and Correcting Memory Errors in C and C++
Installing and Checking Canaries Initially, heap full of canaries Free Allocate Allocate 1 1 2 Install canaries with probability P Check canary Check canary Tolerating and Correcting Memory Errors in C and C++
Heap Differencing • Strategy • Run program multiple times with different randomized heaps • If detect canary corruption, dump contents of heap • Identify objects across runs using allocation order • Key insight: Relation between corruption and object causing corruption is invariant across heaps • Detect invariant across random heaps • More heaps => higher confidence of invariant Tolerating and Correcting Memory Errors in C and C++
Attributing Buffer Overflows corrupted canary delta is constant but unknown ? Run 1 1 2 4 X 3 Run 3 Run 2 Which object caused? 2 1 2 4 4 X 3 3 4 1 X Now only 2 candidates 2 4 4 One candidate! Precision increases exponentially with number of runs Tolerating and Correcting Memory Errors in C and C++
Detecting Dangling Pointers (2 cases) • Dangling pointer read/written (easy) • Invariant = canary in freed object X has same corruption in all runs • Dangling pointer only read (harder) • Sketch of approach (paper explains details) • Only fill freed object X with canary with probability P • Requires multiple trials: ≈ log2(number of callsites) • Look for correlations, i.e., X filled with canary => crash • Establish conditional probabilities • Have: P(callsite X filled with canary | program crashes) • Need: P(crash | filled with canary), guess “prior” to compute Tolerating and Correcting Memory Errors in C and C++
Correcting Allocator • Group objects by allocation site • Patch object groups at allocate/free time • Associate patches with group • Buffer overrun => add padding to size request • malloc(32) becomes malloc(32 + delta) • Dangling pointer => defer free • free(p) becomes defer_free(p, delta_allocations) • Fixes preserve semantics, no new bugs created • Correcting allocation may != DieFast or DieHard • Correction allocator can be space, CPU efficient • “Patches” created separately, installed on-the-fly Tolerating and Correcting Memory Errors in C and C++
Deploying Exterminator • Exterminator can be deployed in different modes • Iterative – suitable for test environment • Different random heaps, identical inputs • Complements automatic methods that cause crashes • Replicated mode • Suitable in a multi/many core environment • Like DieHard replication, except auto-corrects, hot patches • Cumulative mode – partial or complete deployment • Aggregates results across different inputs • Enables automatic root cause analysis from Watson dumps • Suitable for wide deployment, perfect for beta release • Likely to catch many bugs not seen in testing lab Tolerating and Correcting Memory Errors in C and C++
DieFast Overhead Tolerating and Correcting Memory Errors in C and C++
Exterminator Effectiveness • Squid web cache buffer overflow • Crashes glibc 2.8.0 malloc • 3 runs sufficient to isolate 6-byte overflow • Mozilla 1.7.3 buffer overflow (recall demo) • Testing scenario - repeated load of buggy page • 23 runs to isolate overflow • Deployed scenario – bug happens in middle of different browsing sessions • 34 runs to isolate overflow Tolerating and Correcting Memory Errors in C and C++
Outline • Motivation • Exterminator • Collaboration with Emery Berger, Gene Novark • Automatically corrects memory errors • Suitable for large scale deployment • Critical Memory / Samurai • Collaboration with Karthik Pattabiraman, Vinod Grover • New memory semantics • Source changes to explicitly identify and protect critical data • Conclusion Tolerating and Correcting Memory Errors in C and C++
Critical Memory Motivation • C/C++ programs vulnerable to memory errors • Software errors: buffer overflows, etc. • Hardware transient errors: bit flips, etc. • Increasingly a problem due to process shrinking, power • Critical memory goals: • Harden programs from both SW and HW errors • Allow local reasoning about memory state • Allow selective, incremental hardening of apps • Provide compatibility with existing libraries, apps Tolerating and Correcting Memory Errors in C and C++
Main Idea: Data-centric Robustness • Critical memory • Some data is more important than other data • Selectively protect that data from corruption • Examples • Account data, document contents are critical // UI data is not • Game score information, player stats, critical // rendering data structures are not code thatreferencescritical data balance += 100;if (balance < 0) { chargeCredit();} else { x += 10; y += 10;} Data Code balance x, y critical data Tolerating and Correcting Memory Errors in C and C++
Critical Memory Semantics • Conceptually, critical memory is parallel and independent of normal memory • Critical memory requires special allocate/deallocate and read/write operations • critical_store (cstore) – only way to consistently update critical memory • critical_load (cload) – only way to consistently read critical memory • Critical load/store have priority over normal load/store • Normal loads still see the value of critical memory Tolerating and Correcting Memory Errors in C and C++
Example cstore balance, 100 store balance, 10000(applications should not do this) … load balance returns 10000(compatibility semantics) cload balance returns 100 (possibly triggers exception) cstore balance, 100 … cload balance returns 100 load balance returns 100 cstore100 cstore 100 load load store 10000 cload normalmemory normalmemory 100 10000 criticalmemory criticalmemory 100 100 Tolerating and Correcting Memory Errors in C and C++
Critical Memory Deployment • Define “critical” type specifier: • Like “const” • Easy to use, minimal source mods • Allows local reasoning • External libraries, code cannot modify critical data • Tolerates memory errors • Non-critical overflows cannot corrupt critical values • Alllows static analysis of program subset • Critical subset of program can be statically checked independently • Additional checking on critical data possible int x, y, buffer[10];critical int balance = 100; third_party_lib(&x, &y);buffer[10] = 10000; // Bug!// balance still == 100if (balance < 0) { chargeCredit();} else { x += 10; y += 10;} Tolerating and Correcting Memory Errors in C and C++
Third-party Libraries/Untrusted Code • Library code does not need to be critical memory aware • If library does not mod critical data, no changes required • If library needs to modify critical data • Allow normal stores to critical memory in library • Explicitly “promote” on return • Copy-in, copy-out semantics critical intbalance = 100;…library_foo(&balance); promote balance; …__________________ // arg is not critical int * void library_foo(int *arg) { *arg = 10000; return;} Tolerating and Correcting Memory Errors in C and C++
Samurai: SCM Prototype • Software critical memory for heap objects • Critical objects allocated with crit_malloc, crit_free • Approach • Replication – base copy + 2 shadow copies • Redundant metadata • Stored with base copy, copy in hash table • Checksum, size data for overflow detection • Robust allocator as foundation • DieHard, unreplicated • Randomizes locations of shadow copies Tolerating and Correcting Memory Errors in C and C++
Samurai Implementation cstore balance, 100 store balance, 10000… load balance returns 10000 cload balance returns 100 cstore balance, 100 … cload balance returns 100 load balance returns 100 cload cstore100 cload store 10000 =? =? =? load metadata metadata basecopy basecopy cs 100 cs 10000 shadowcopies shadowcopies 100 100 100 100 Tolerating and Correcting Memory Errors in C and C++
Samurai Experimental Results • Prototype implementation of critical memory • Fault-tolerant runtime system for C/C++ • Applied to heap objects • Automated Phoenix compiler pass • Identified critical data for five SPECint applications • Low overheads for most applications (less than 10%) • Conducted fault-injection experiments • Fault tolerance significantly improved over based code • Low probability of fault-propagation from non-critical data to critical data for most applications • No new assertions or consistency checks added Tolerating and Correcting Memory Errors in C and C++
Performance Results Tolerating and Correcting Memory Errors in C and C++
Fault Injection into Critical Data (vpr) Tolerating and Correcting Memory Errors in C and C++
Fault Injection into Non-Critical Data Tolerating and Correcting Memory Errors in C and C++
Experience with CM for Components • Hardened two libraries with critical memory • CM-STL – hardened List Class data and metadata • CM-malloc – hardened allocator metadata • Effort • Relatively small changes to existing code • Performance impact • CM-STL in Web Server thread list => 10% slower • CM-malloc in cfrac benchmark => 22% slower Tolerating and Correcting Memory Errors in C and C++
Conclusion • Programs written in C / C++ can execute safely and correctly despite memory errors • Research vision • Improve existing code without source modifications • Reduce human generated patches required • Increase reliability, security by order of magnitude • Current projects • DieHard / Exterminator: automatically detect and correct memory errors (with high probability) • Critical Memory / Samurai: enable local reasoning, allow selective hardening, compatibility • ToleRace: replication to hide data races Tolerating and Correcting Memory Errors in C and C++
Hardware Trends (1) Reliability • Hardware transient faults are increasing • Even type-safe programs can be subverted in presence of HW errors • Academic demonstrations in Java, OCaml • Soft error workshop (SELSE) conclusions • Intel, AMD now more carefully measuring • “Not practical to protect everything” • Faults need to be handled at all levels from HW up the software stack • Measurement is difficult • How to determine soft HW error vs. software error? • Early measurement papers appearing Tolerating and Correcting Memory Errors in C and C++
Hardware Trends (2) Multicore • DRAM prices dropping • 2Gb, Dual Channel PC 6400 DDR2 800 MHz $85 • Multicore CPUs • Quad-core Intel Core 2 Quad, AMD Quad-core Opteron • Eight core Intel by 2008? • Challenge:How should we use all this hardware? Tolerating and Correcting Memory Errors in C and C++
Additional Information • Web sites: • Ben Zorn: http://research.microsoft.com/~zorn • DieHard: http://www.diehard-software.org/ • Exterminator: http://www.cs.umass.edu/~gnovark/ • Publications • Emery D. Berger and Benjamin G. Zorn, "DieHard: Probabilistic Memory Safety for Unsafe Languages", PLDI’06. • Karthik Pattabiraman, Vinod Grover, and Benjamin G. Zorn, "Samurai - Protecting Critical Data in Unsafe Languages", Microsoft Research, MSR-TR-2006-127, September 2006. • Gene Novark, Emery D. Berger and Benjamin G. Zorn, “Exterminator: Correcting Memory Errors with High Probability", PLDI’07. Tolerating and Correcting Memory Errors in C and C++
Backup Slides Tolerating and Correcting Memory Errors in C and C++
DieHard: Probabilistic Memory Safety • Collaboration with Emery Berger • Plug-compatible replacement for malloc/free in C lib • We define “infinite heap semantics” • Programs execute as if each object allocated with unbounded memory • All frees ignored • Approximating infinite heaps – 3 key ideas • Overprovisioning • Randomization • Replication • Allows analytic reasoning about safety Tolerating and Correcting Memory Errors in C and C++
Overprovisioning, Randomization Expand size requests by a factor of M (e.g., M=2) 1 2 3 4 5 Pr(write corrupts) = ½ ? 1 2 3 4 5 Randomize object placement 4 2 3 1 5 Pr(write corrupts) = ½ ! Tolerating and Correcting Memory Errors in C and C++
Replication (optional) Replicate process with different randomization seeds P1 1 3 2 5 4 P2 input 4 3 1 5 2 P3 5 2 1 4 3 Broadcast input to all replicas Voter Compare outputs of replicas, kill when replica disagrees Tolerating and Correcting Memory Errors in C and C++
DieHard Implementation Details • Multiply allocated memory by factor of M • Allocation • Segregate objects by size (log2), bitmap allocator • Within size class, place objects randomly in address space • Randomly re-probe if conflicts (expansion limits probing) • Separate metadata from user data • Fill objects with random values – for detecting uninit reads • Deallocation • Expansion factor => frees deferred • Extra checks for illegal free Tolerating and Correcting Memory Errors in C and C++
Over-provisioned, Randomized Heap Segregated size classes - Static strategy pre-allocates size classes- Adaptive strategy grows each size class incrementally L = max live size≤ H/2 F= free = H-L … 2 4 5 3 1 6 object size = 8 object size = 16 H = max heap size, class i Tolerating and Correcting Memory Errors in C and C++
Randomness enables Analytic ReasoningExample: Buffer Overflows • k = # of replicas, Obj = size of overflow • With no replication, Obj = 1, heap no more than 1/8 full:Pr(Mask buffer overflow), = 87.5% • 3 replicas: Pr(ibid) = 99.8% Tolerating and Correcting Memory Errors in C and C++
DieHard CPU Performance (no replication) Tolerating and Correcting Memory Errors in C and C++
DieHard CPU Performance (Linux) Tolerating and Correcting Memory Errors in C and C++
Correctness Results • Tolerates high rate of synthetically injected errors in SPEC programs • Detected two previously unreported benign bugs (197.parser and espresso) • Successfully hides buffer overflow error in Squid web cache server (v 2.3s5) • But don’t take my word for it… Tolerating and Correcting Memory Errors in C and C++
Experiments / Benchmarks • vpr: Does place and route on FPGAs from netlist • Made routing-resource graph critical • crafty: Plays a game of chess with the user • Made cache of previously-seen board positions critical • gzip: Compress/Decompresses a file • Made Huffman decoding table critical • parser: Checks syntactic correctness of English sentences based on a dictionary • Made the dictionary data structures critical • rayshade: Renders a scene file • Made the list of objects to be rendered critical Tolerating and Correcting Memory Errors in C and C++
Related Work • Conservative GC (Boehm / Demers / Weiser) • Time-space tradeoff (typically >3X) • Provably avoids certain errors • Safe-C compilers • Jones & Kelley, Necula, Lam, Rinard, Adve, … • Often built on BDW GC • Up to 10X performance hit • N-version programming • Replicas truly statistically independent • Address space randomization (as in Vista) • Failure-oblivious computing [Rinard] • Hope that program will continue after memory error with no untoward effects Tolerating and Correcting Memory Errors in C and C++