1 / 27

CSE P501 – Compiler Construction

CSE P501 – Compiler Construction. Conventional Heap Storage Garbage Collection. Conventional Heap Storage. C. ... char* s = (char*) malloc (50 ); ... free(s );. In Use. C Runtime Heap Memory. Developer must remember to free memory when no longer required

tejano
Download Presentation

CSE P501 – Compiler Construction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE P501 – Compiler Construction Conventional Heap Storage Garbage Collection Jim Hogg - UW - CSE P501

  2. Conventional Heap Storage C ... char* s = (char*) malloc(50); ... free(s); In Use C Runtime Heap Memory • Developer must remember to free memory when no longer required • Eventual fragmentation => slow to malloc, slow to free Jim Hogg - UW - CSE P501

  3. Heap Storage Fragmentation In Use C Runtime Heap Memory • malloc: walk the freelist to find a slot big enough for current request • free: adjust freelist; collapse contiguous freespace • fragmentation: plenty free chunks but none big enough for request • cannot compact the used space - may contain pointers; may be pointed-at Jim Hogg - UW - CSE P501

  4. Bugs • Forget to free => eventually run out of memory • called a "memory leak" • Call free, but continue to use! • called "use-after-free", or "dangling pointer" • memory corruption - wrong answers; crash if lucky! • major source of security issues • detect via "pool poisoning" 2 pointers free via 1 free malloc; corruption! Jim Hogg - UW - CSE P501

  5. Solving Memory Leaks C++ int f() { ... Car c = Car("Chevy"); ... } • object c is destructed [calls ~Car() ]automatically when c it falls out of scope • called RAII ("Resource Acquisition Is Initialization") • more general case still requires Developer to delete c when no longer needed • "smart pointers" can help Jim Hogg - UW - CSE P501

  6. Solving Leaks and Use-After-Free public static int Main(String[] args) { ... if (...) { ArrayList cars = new ArrayList(); for (inti = 0; i < 1000; i++) { cars.Add(new Car()); } ... } ... Boat b = new Boat(); ... } A B • At point A I have 1000 Car objects allocated on the heap • At some later time I ask for more heap memory. May trigger a "garbage collection" - automatic, and silent • Compiler realizes that cars is not referenced later by the program; so it's "garbage"; GC recycles it for use by other objects • Magically solves leaks and use-after-free!

  7. Garbage Collection next Allocate an object; fast! next Allocate more objects; and one more, please? next Jim Hogg - UW - CSE P501

  8. Garbage Collection, 1 Allocate another object next "roots" Trace reachable objects next "roots" Compact unreachables; update all pointers next GC does not find garbage: it finds live objects and ignores all other memory Jim Hogg - UW - CSE P501

  9. Roots? • "Roots" are locations that hold a pointer to any object in the heap: • Method-local variables • Method Arguments • Global variables • Class-static fields • Registers Jim Hogg - UW - CSE P501

  10. GC Start root root Jim Hogg - UW - CSE P501

  11. GC Mark Phase root root Unreachable Reachable Jim Hogg - UW - CSE P501

  12. GC Sweep Phase root root Reachable With memory free, now allocate space for object that provoked the GC Jim Hogg - UW - CSE P501

  13. No Bugs • GC removes control over free'ing memory from the hands of the Developer • GC will find all live objects and reclaim all other memory each time it is invoked. So no "memory leak"s • An object is garbage-collected only if it was unreachable by the program; being unreachable, GC removes "use-after-free" bugs • So what's a "memory leak" in C# or Java? - holding on to objects that really could be freed - eg, by setting the object-ref to null. Particularly troublesome in a Generational Garbage Collector • Forget to free => eventually run out of memory • called a "memory leak" • Call free, but continue to use! • called "use-after-free" Jim Hogg - UW - CSE P501

  14. C# Finalizers class C { private StreamWritersw; ... ~C() { sw.Close(); } // flush output buffer ... } • Class C above defines a "Finalize" method. Syntax is ~C() • Syntactic sugar for: protected override void Finalize() • As each object of class C is created, the CLR links it onto a Finalization Queue • When the object is about to be reclaimed, it is moved onto the FreachableQueue; a Finalizer background thread will run that Finalize method later Jim Hogg - UW - CSE P501

  15. Finalization root Finalization Queue root Finalization Queue Freachable Queue Not live; not dead; zombies Jim Hogg - UW - CSE P501

  16. Threads • Compiler needs to create GC info - at each point in program, where are current roots? - variables, registers • GC info is bulky - cannot store for every instruction! • So store GC info for selected points in the program - "GC-safepoints" • Typical GC-safepoint is a function return • Need to stop all threads to carry out a GC • For each thread stack-frame, stomp its return address; when it reaches there, the thread will return, not to its caller, but into GC code Jim Hogg - UW - CSE P501

  17. Hijacking a Thread Thread continues to run Eventually returns and hits the hijack Save & Stomp retaddr Top of Stack retaddr gc gc gc GC GC GC caller Jim Hogg - UW - CSE P501

  18. Threads Run to Their Safepoints Start GC After each thread has hit its safepoint, GC can start its sweep phase Jim Hogg - UW - CSE P501

  19. Object Age Distribution • Performing GC over large heap consumes lots of cpu time (and trashes the caches!) • Experiment reveals: • Many objects live a very short time (high "infant mortality") • Some object live a very long time • Bi-modal number Object Ages age Jim Hogg - UW - CSE P501

  20. Generational Garbage Collection Gen0 Gen1 Gen2 'Nursery' • Divide heap into 3 areas • Allocate new objects from Gen0 • At next GC • move all Gen0 survivors into Gen1 • move all Gen1 survivors into Gen2 Jim Hogg - UW - CSE P501

  21. Migration thru Generations Gen0 Gen1 Gen2 Survivors Gen0 Gen1 Gen2 Gen0 Gen1 Gen2 Gen0 0 Gen1 Gen2 Jim Hogg - UW - CSE P501

  22. Generational GC • Garbage-collecting entire heap is expensive • A process of diminishing returns • Instead, Garbage-Collect only the nursery (Gen0) • Smaller, so faster • Yields lots of free space (objects die young) • Maximum 'bang for the buck' • Sometime collect Gen0+Gen1 • When the going gets tough, collect Gen0+Gen1+Gen2 ("full" collection) • Some old objects may become unreachable - just ignore them in a Gen0 collect • Above picture assumes no writes to old objects kept Gen0 objects reachable Jim Hogg - UW - CSE P501

  23. Card Table • In practice, older objects are written-to between GCs • Might update a pointer that results in keeping young object alive; missing this is a disaster Gen0 Gen1 Gen2 Gen1 Card Table Gen2 Card Table • Card Tables: bitmap, with 1 bit per 128 Bytes of Gen2 heap • Create a "write barrier" - any write into Gen1 or Gen2 heap sets corresponding bit in Card Table • During GC sweep, check Card Tables for any "keep alive" pointers • In practice, Card Tables contain few 1 bits Jim Hogg - UW - CSE P501

  24. Debugging the GC • What if a GC goes wrong? • Fails to free some memory that is unreachable => memory leak • Frees some memory that is still reachable - impending disaster • Frees some non-objects - disaster • Colloquially called a GC 'hole' • Causes • Bug in GC - mark, sweep, compaction • Bug in GC info • Symptoms • Wrong answers • Process Crash • If we're 'lucky' failure comes soon after error • Otherwise, failure may occur seconds, minutes or hours later • Testing • GC-Stress • eg: maximally randomize the heap; force frequent GCs; check results Jim Hogg - UW - CSE P501

  25. Further Design Complexities • Objects with Finalizers: • take 2 GCs to die • extend the lifetime of other connected objects • run at some later time => "non-deterministic" finalization • no guarantee on order that finalizers runs • uncaught exceptions escape and disappear! • Strong refs • Weak refs - "short" and "long" • "Dispose" pattern • Resurrection • Large Object Heap (objects > 85 kB in size) • Concurrent Garbage Collection • Per-processor heaps; processor affinity • Fully-interruptible GC • Safe-point on back-edge of loop Jim Hogg - UW - CSE P501

  26. References for CLR GC Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework – by Jeffrey Richter Garbage Collection—Part 2: Automatic Memory Management in the Microsoft .NET Framework - by Jeffrey Richter Garbage Collector Basics and Performance Hints – by Rico Mariani Using GC Efficiently – Part 1 by Maoni Using GC Efficiently – Part 2 by Maoni Using GC Efficiently – Part 3 by Maoni Using GC Efficiently – Part 4 by Maoni GC Performance Counters - by Maoni Tools that help diagnose managed memory related issues – by Maoni Clearing up some confusion over finalization and other areas in GC – by Maoni Jim Hogg - UW - CSE P501

  27. And a bit of perspective… • Automatic GC has been around since LISP I in 1958 • Ubiquitous in functional and object-oriented programming communities for decades • Mainstream since Java(?) (mid-90s) • Now conventional? - nope! • Specialized patterns of allocate/free are still better coded by-hand - eg "Arena" storage inside compiler optimization phases Jim Hogg - UW - CSE P501

More Related