270 likes | 399 Views
CSE P501 – Compiler Construction. Conventional Heap Storage Garbage Collection. Conventional Heap Storage. C. ... char* s = (char*) malloc (50 ); ... free(s );. In Use. C Runtime Heap Memory. Developer must remember to free memory when no longer required
E N D
CSE P501 – Compiler Construction Conventional Heap Storage Garbage Collection Jim Hogg - UW - CSE P501
Conventional Heap Storage C ... char* s = (char*) malloc(50); ... free(s); In Use C Runtime Heap Memory • Developer must remember to free memory when no longer required • Eventual fragmentation => slow to malloc, slow to free Jim Hogg - UW - CSE P501
Heap Storage Fragmentation In Use C Runtime Heap Memory • malloc: walk the freelist to find a slot big enough for current request • free: adjust freelist; collapse contiguous freespace • fragmentation: plenty free chunks but none big enough for request • cannot compact the used space - may contain pointers; may be pointed-at Jim Hogg - UW - CSE P501
Bugs • Forget to free => eventually run out of memory • called a "memory leak" • Call free, but continue to use! • called "use-after-free", or "dangling pointer" • memory corruption - wrong answers; crash if lucky! • major source of security issues • detect via "pool poisoning" 2 pointers free via 1 free malloc; corruption! Jim Hogg - UW - CSE P501
Solving Memory Leaks C++ int f() { ... Car c = Car("Chevy"); ... } • object c is destructed [calls ~Car() ]automatically when c it falls out of scope • called RAII ("Resource Acquisition Is Initialization") • more general case still requires Developer to delete c when no longer needed • "smart pointers" can help Jim Hogg - UW - CSE P501
Solving Leaks and Use-After-Free public static int Main(String[] args) { ... if (...) { ArrayList cars = new ArrayList(); for (inti = 0; i < 1000; i++) { cars.Add(new Car()); } ... } ... Boat b = new Boat(); ... } A B • At point A I have 1000 Car objects allocated on the heap • At some later time I ask for more heap memory. May trigger a "garbage collection" - automatic, and silent • Compiler realizes that cars is not referenced later by the program; so it's "garbage"; GC recycles it for use by other objects • Magically solves leaks and use-after-free!
Garbage Collection next Allocate an object; fast! next Allocate more objects; and one more, please? next Jim Hogg - UW - CSE P501
Garbage Collection, 1 Allocate another object next "roots" Trace reachable objects next "roots" Compact unreachables; update all pointers next GC does not find garbage: it finds live objects and ignores all other memory Jim Hogg - UW - CSE P501
Roots? • "Roots" are locations that hold a pointer to any object in the heap: • Method-local variables • Method Arguments • Global variables • Class-static fields • Registers Jim Hogg - UW - CSE P501
GC Start root root Jim Hogg - UW - CSE P501
GC Mark Phase root root Unreachable Reachable Jim Hogg - UW - CSE P501
GC Sweep Phase root root Reachable With memory free, now allocate space for object that provoked the GC Jim Hogg - UW - CSE P501
No Bugs • GC removes control over free'ing memory from the hands of the Developer • GC will find all live objects and reclaim all other memory each time it is invoked. So no "memory leak"s • An object is garbage-collected only if it was unreachable by the program; being unreachable, GC removes "use-after-free" bugs • So what's a "memory leak" in C# or Java? - holding on to objects that really could be freed - eg, by setting the object-ref to null. Particularly troublesome in a Generational Garbage Collector • Forget to free => eventually run out of memory • called a "memory leak" • Call free, but continue to use! • called "use-after-free" Jim Hogg - UW - CSE P501
C# Finalizers class C { private StreamWritersw; ... ~C() { sw.Close(); } // flush output buffer ... } • Class C above defines a "Finalize" method. Syntax is ~C() • Syntactic sugar for: protected override void Finalize() • As each object of class C is created, the CLR links it onto a Finalization Queue • When the object is about to be reclaimed, it is moved onto the FreachableQueue; a Finalizer background thread will run that Finalize method later Jim Hogg - UW - CSE P501
Finalization root Finalization Queue root Finalization Queue Freachable Queue Not live; not dead; zombies Jim Hogg - UW - CSE P501
Threads • Compiler needs to create GC info - at each point in program, where are current roots? - variables, registers • GC info is bulky - cannot store for every instruction! • So store GC info for selected points in the program - "GC-safepoints" • Typical GC-safepoint is a function return • Need to stop all threads to carry out a GC • For each thread stack-frame, stomp its return address; when it reaches there, the thread will return, not to its caller, but into GC code Jim Hogg - UW - CSE P501
Hijacking a Thread Thread continues to run Eventually returns and hits the hijack Save & Stomp retaddr Top of Stack retaddr gc gc gc GC GC GC caller Jim Hogg - UW - CSE P501
Threads Run to Their Safepoints Start GC After each thread has hit its safepoint, GC can start its sweep phase Jim Hogg - UW - CSE P501
Object Age Distribution • Performing GC over large heap consumes lots of cpu time (and trashes the caches!) • Experiment reveals: • Many objects live a very short time (high "infant mortality") • Some object live a very long time • Bi-modal number Object Ages age Jim Hogg - UW - CSE P501
Generational Garbage Collection Gen0 Gen1 Gen2 'Nursery' • Divide heap into 3 areas • Allocate new objects from Gen0 • At next GC • move all Gen0 survivors into Gen1 • move all Gen1 survivors into Gen2 Jim Hogg - UW - CSE P501
Migration thru Generations Gen0 Gen1 Gen2 Survivors Gen0 Gen1 Gen2 Gen0 Gen1 Gen2 Gen0 0 Gen1 Gen2 Jim Hogg - UW - CSE P501
Generational GC • Garbage-collecting entire heap is expensive • A process of diminishing returns • Instead, Garbage-Collect only the nursery (Gen0) • Smaller, so faster • Yields lots of free space (objects die young) • Maximum 'bang for the buck' • Sometime collect Gen0+Gen1 • When the going gets tough, collect Gen0+Gen1+Gen2 ("full" collection) • Some old objects may become unreachable - just ignore them in a Gen0 collect • Above picture assumes no writes to old objects kept Gen0 objects reachable Jim Hogg - UW - CSE P501
Card Table • In practice, older objects are written-to between GCs • Might update a pointer that results in keeping young object alive; missing this is a disaster Gen0 Gen1 Gen2 Gen1 Card Table Gen2 Card Table • Card Tables: bitmap, with 1 bit per 128 Bytes of Gen2 heap • Create a "write barrier" - any write into Gen1 or Gen2 heap sets corresponding bit in Card Table • During GC sweep, check Card Tables for any "keep alive" pointers • In practice, Card Tables contain few 1 bits Jim Hogg - UW - CSE P501
Debugging the GC • What if a GC goes wrong? • Fails to free some memory that is unreachable => memory leak • Frees some memory that is still reachable - impending disaster • Frees some non-objects - disaster • Colloquially called a GC 'hole' • Causes • Bug in GC - mark, sweep, compaction • Bug in GC info • Symptoms • Wrong answers • Process Crash • If we're 'lucky' failure comes soon after error • Otherwise, failure may occur seconds, minutes or hours later • Testing • GC-Stress • eg: maximally randomize the heap; force frequent GCs; check results Jim Hogg - UW - CSE P501
Further Design Complexities • Objects with Finalizers: • take 2 GCs to die • extend the lifetime of other connected objects • run at some later time => "non-deterministic" finalization • no guarantee on order that finalizers runs • uncaught exceptions escape and disappear! • Strong refs • Weak refs - "short" and "long" • "Dispose" pattern • Resurrection • Large Object Heap (objects > 85 kB in size) • Concurrent Garbage Collection • Per-processor heaps; processor affinity • Fully-interruptible GC • Safe-point on back-edge of loop Jim Hogg - UW - CSE P501
References for CLR GC Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework – by Jeffrey Richter Garbage Collection—Part 2: Automatic Memory Management in the Microsoft .NET Framework - by Jeffrey Richter Garbage Collector Basics and Performance Hints – by Rico Mariani Using GC Efficiently – Part 1 by Maoni Using GC Efficiently – Part 2 by Maoni Using GC Efficiently – Part 3 by Maoni Using GC Efficiently – Part 4 by Maoni GC Performance Counters - by Maoni Tools that help diagnose managed memory related issues – by Maoni Clearing up some confusion over finalization and other areas in GC – by Maoni Jim Hogg - UW - CSE P501
And a bit of perspective… • Automatic GC has been around since LISP I in 1958 • Ubiquitous in functional and object-oriented programming communities for decades • Mainstream since Java(?) (mid-90s) • Now conventional? - nope! • Specialized patterns of allocate/free are still better coded by-hand - eg "Arena" storage inside compiler optimization phases Jim Hogg - UW - CSE P501