510 likes | 652 Views
Computer Systems Principles Dynamic Memory Management. Emery Berger and Mark Corner University of Massachusetts Amherst. Dynamic Memory Management. How the heap manager is implemented malloc, free new, delete. Memory Management. Programs ask memory manager
E N D
Computer Systems PrinciplesDynamic Memory Management Emery Berger and Mark Corner University of Massachusetts Amherst
Dynamic Memory Management • How the heap manager is implemented • malloc, free • new, delete
Memory Management • Programs ask memory manager • to allocate/free objects (or multiple pages) • Memory manager asks OS • to allocate/free pages (or multiple pages) User Program Objects (new, malloc) Allocator(java, libc) Pages (mmap,brk) Operating System
Memory Management • Ideal memory manager: • Fast • Raw time, asymptotic runtime, locality • Memory efficient • Low fragmentation • With multicore & multiprocessors: • Scalable to multiple processors • New issues: • Secure from attack • Reliable in face of errors
Memory Manager Functions • Not just malloc/free • realloc • Change size of object, copying old contents • ptr = realloc (ptr, 10); • But: realloc(ptr, 0) = ? • How about: realloc (NULL, 16) ? • Other fun • calloc • memalign • Needs ability to locate size & object start
Fragmentation • Intuitively, fragmentation stems from “breaking” up heap into unusable spaces • More fragmentation = worse utilization • External fragmentation • Wasted space outside allocated objects • Internal fragmentation • Wasted space inside an object
Classical Algorithms • First-fit • find first chunk of desired size
Classical Algorithms • Best-fit • find chunk that fits best • Minimizes wasted space
Classical Algorithms • Worst-fit • find chunk that fits worst • name is a misnomer! • keeps large holes around • Reclaim space: coalesce free adjacent objects into one big object
Quick Activity • Program asks for: 300,25,25,100 • First-fit and best-fit allocations go where? • Which ones cannot be fulfilled? • What about: 110,54,25,70,50?
Implementation Techniques • Freelists • Linked lists of objects in same size class • Range of object sizes • First-fit, best-fit in this context? • Which is faster?
Implementation Techniques • Segregated size classes • Use free lists, but never coalesce or split • Choice of size classes • Exact • Powers-of-two
Implementation Techniques • Big Bag of Pages (BiBOP) • Page or pages (multiples of 4K) • Usually segregated size classes • Header contains metadata • Locate with bitmasking • Limits external fragmentation • Can be very fast • Secret Sauce for project • Use free objects to track free objects
Runtime Analysis • Key components • Cost of malloc (best, worst, average) • Cost of free • Cost of size lookup (for realloc & free)
Space Bounds • Fragmentation worst-case for “optimal”: O(log M/m) • M = largest object size • m = smallest object size • Best-fit = O(M * m) !
Performance Issues • Goal: perform well for typical programs • Considerations: • Internal fragmentation • External fragmentation • Headers (metadata) • Scalability (later) • Reliability, too • “Canned” allocator often seen as slow
Programmers replace new/delete Reduce runtime Often Expand functionality Sometimes Reduce space rarely Very common Apache, gcc, lcc, STL, database servers… Language-level support in C++ Widely recommended Custom Memory Allocation “Use custom allocators”
Drawbacks of Custom Allocators • Avoiding system allocator: • More code to maintain & debug • Can’t use memory debuggers • Not modular or robust: • Mix memory from customand general-purpose allocators → crash! • Increased burden on programmers
(1) Per-Class Allocators • Recycle freed objects from a free list a = new Class1; b = new Class1; c = new Class1; delete a; delete b; delete c; a = new Class1; b = new Class1; c = new Class1; • Fast • Linked list operations • Simple • Identical semantics • C++ language support • Possibly space-inefficient a b c
end_of_array end_of_array end_of_array end_of_array end_of_array end_of_array (II) Custom Patterns • Tailor-made to fit allocation patterns • Example: 197.parser (natural language parser) d a b c char[MEMORY_LIMIT] a = xalloc(8); b = xalloc(16); c = xalloc(8); xfree(b); xfree(c); d = xalloc(8); • Fast • Pointer-bumping allocation • Brittle • Fixed memory size • Requires stack-like lifetimes
Fast Pointer-bumping allocation Deletion of chunks Convenient One call frees all memory (III) Regions • Separate areas, deletion only en masse regioncreate(r) r regionmalloc(r, sz) regiondelete(r) • Risky • Dangling references • Too much space • Increasingly popular custom allocator
Custom Allocators Are Faster… • As good as and sometimes much faster than Win32
Not So Fast… • DLmalloc (Linux): as fast or faster for most benchmarks
Are custom allocators a win? • Generally not worth the trouble • Just use good general-purpose allocator • Alternative: reaps (hybrid of regions & heaps) • However… • Sometimes worth it for specialized apps • Especially pool allocation, as in Apache
Problems w/Unsafe Languages • C, C++: pervasive apps, but langs. unsafe • Numerous opportunities for security vulnerabilities, errors • Double free • Invalid free • Uninitialized reads • Dangling pointers • Buffer overflows (stack & heap) • Can memory allocator help?
Soundness for Erroneous Progs • Normally: memory errors lead to crashes, but…consider infinite-heap allocator: • All news fresh; ignore delete • No dangling pointers, invalid frees,double frees • Every object infinitely large • No buffer overflows, data overwrites • Transparent to correct program • “Erroneous” programs sound
Probabilistic Memory Safety • Fully-randomized M-heap • Approximates with M, e.g., M=2 • Increases odds of benign errors • Probabilistic memory safety • i.e., P(no error) n • Errors independent across heaps • E(users with no error) n * |users|
DieHard • Key ideas: • Isolate heap metadata • Randomize Allocation • Trade space for robustness • Replication (optional) • Key influence in design of Windows 7’s Fault-Tolerant Heap
Implementation Issues • Conventional, freelist-based heaps • Hard to randomize, protect from errors • Double frees, heap corruption • What about bitmaps? (one bit per word) • Catastrophic fragmentation! • Each small object likely to occupy one page obj obj obj obj pages
Randomized Heap Layout 00000001 1010 metadata heap • Bitmap-based, segregated size classes • Bit represents one object of given size • i.e., one bit = 2i+3 bytes, etc. • Prevents fragmentation
Randomized Allocation 00000001 1010 metadata heap • malloc(8): • compute size class = ceil(log sz) – 3 • randomly probe bitmap for zero-bit (free) • Fast: runtime O(1) • M=2 means E[# of probes] = 2
Randomized Allocation 00010001 1010 metadata heap • malloc(8): • compute size class = ceil(log sz) – 3 • randomly probe bitmap for zero-bit (free) • Fast: runtime O(1) • M=2 means E[# of probes] = 2
Randomized Deallocation 00010001 1010 metadata heap • free(ptr): • Ensure object valid – aligned to right address • Ensure allocated – bit set • Resets bit • Prevents invalid frees, double frees
Randomized Deallocation 00010001 1010 metadata heap • free(ptr): • Ensure object valid – aligned to right address • Ensure allocated – bit set • Resets bit • Prevents invalid frees, double frees
Randomized Deallocation 00000001 1010 metadata heap • free(ptr): • Ensure object valid – aligned to right address • Ensure allocated – bit set • Resets bit • Prevents invalid frees, double frees
… 1 6 3 2 5 4 1 Randomized Heaps & Reliability object size = 2i+3 object size = 2i+4 … 2 4 5 3 1 6 3 My Mozilla: “malignant” overflow • Objects randomly spread across heap • Different run = different heap • Errors across heaps independent Your Mozilla: “benign” overflow
Increasing Reliability • Space Shuttle • 3 copies of everything(hw & sw) • Votes on every action • Failure:majority rules
seed1 replica1 seed2 replica2 seed3 replica3 DieHard - Replication input output vote broadcast • Replication-based fault-tolerance • Requires randomization! Makes errors independent execute replicas(separate processes)
DieHard Results • Empirical results • Runtime overhead • Error avoidance • Injected faults & actual applications • Analytical results (if time, pictures!) • Buffer overflows • Uninitialized reads • Dangling pointer errors (the best)
Analytical Results: Buffer Overflows • Model overflow as random write of live data • Heap half full (max occupancy)
Analytical Results: Buffer Overflows • Model overflow as random write of live data • Heap half full (max occupancy)
Analytical Results: Buffer Overflows • Model overflow: random write of live data • Heap half full (max occupancy)
Analytical Results: Overflows • Replicas: Increase odds of avoiding overflow in at least one replica replicas
Analytical Results: Overflows • Replicas: Increase odds of avoiding overflow in at least one replica replicas
Analytical Results: Overflows • Replicas: Increase odds of avoiding overflow in at least one replica • P(Overflow in all replicas) = (½)3 = 1/8 • P(No overflow in > 1 replica) = 1-(½)3 = 7/8 replicas
Analytical Results: Buffer Overflows F = free space H = heap size N = # objects worth of overflow k = replicas • Overflow one object
Error Avoidance • Injected faults: • Dangling pointers (@50%, 10 allocations) • glibc: crashes; DieHard: 9/10 correct • Overflows (@1%, 4 bytes over) – • glibc: crashes 9/10, inf loop; DieHard: 10/10 correct • Real faults: • Avoids Squid web cache overflow • Crashes Boehm-Demers-Weiser(BDW) Collector & glibc • Avoids dangling pointer error in Mozilla • DoS in glibc & Windows