390 likes | 517 Views
Conservative Garbage Collection. Stephan Lesch January 9, 2002 slesch@studcs.uni-sb.de. Contents. Intro Conservative GC Mostly Copying Collection Hidden Pointer Problems GC for C++. Type-accurate GC: locations of pointers are known no pointer arithmetic
E N D
Conservative Garbage Collection Stephan LeschJanuary 9, 2002slesch@studcs.uni-sb.de
Contents • Intro • Conservative GC • Mostly Copying Collection • Hidden Pointer Problems • GC for C++
Type-accurate GC: locations of pointers are known no pointer arithmetic often tailored to one software product usually supported by compiler/runtime system So Far
every register/word potiential pointer non-supportive environment little/no knowledge about register usage object/stack layout should work with any C/C++ programs programmers don‘t want to pay for GC unless needed must coexist with explicit memory management The middle way: programmer/compiler provide information to recognize pointers Ambiguous Roots Collection
Conservative GC Boehm/Demers/Weiser (Xerox PARC) [1988] • non-moving mark-and-deferred-sweep collector • fully conservative, no reliance on compiler no extra bits to distinguish pointer/non-pointer no additional object headers • for C and C++ • for Unix, OS/2, Mac, Win95/NT • supports incremental/generational collection • can function as space leak detector
Heap Layout Two logically distinct heaps: Standard heap • malloc / free • compatible with existing code • no pointers to collected heap! Collected heap • GC_malloc • GC_free to free known garbage • pointers to standard heap ignored
Layout of Collected Heap • made up of blocks (e.g. 4 K, aligned to 4 K boundaries) • one object size per block • for each object size: • bitmap to mark allocated objects • freelist (linked list of heap block slots) • reclaimable blocks queue (deferred sweep) • heap-block free-list
Allocation for objects > 1/2 block: allocate chunk of blocks(heap-block free list) none available GC not enough space reclaimed expand heap for small objects: pop free-list for this size free-list is empty resume sweep phase still empty GC not enough space reclaimed expand heap Clear object after allocation!
Finding Roots & Pointers • possible roots: registers, stack, static areas • no cooperation from compiler • treat every word as potential pointer • ignore interior pointers (standard) • prefer marking from false pointers over ignoring valid pointers Conservative Pointer Identification: given word p; • does p refer to the collected heap? • does it point into heap block allocated by collector? • does it point to the beginning of an object in that block? if yes, • mark object in block header • push object onto mark stack finally: reset mark bits of objects on free-lists
Misidentification • integers accidentally fulfilling validity tests • avoid need to trace from interior pointers... • ... or unaligned pointers: 000000090000000A • avoid addresses with lots of trailing 0’s • try to avoid generating false references: • collector clears non-atomic objects after alloc • GC_malloc_atomic for objects without pointers • programmer initialize structures • programmer destroy obsolete pointers (“dead pointers on stack are often the most significant source of leaks”)
Black Listing Idea: don’t allocate in heap blocks at addresses likely to collide with invalid pointers: • black list references to vincinity of heap which fail validity tests • extra run before first allocation finds false references in static data • additional space overhead < 10% • but: difficult to allocate >100K without spanning black-listed blocks
Influence of Data Structures Problems with: large structures + interior pointers strongly connected structures Lisp: • small disjoint garbage structures • lists constructed of cons-cells => Conservative GC worked well, memory leaks remain bounded (<8% leakage, constant amount) KRC: • large, strongly connected structures • next pointers in objects => collector thrashed [Wentworth, 1990]
Efficiency (1) Comparative studies by Zorn, 1992; Detlefs et al. 1994 • „real-world“ C programs: (perl, xfig, GhostScript) • comparing BDW w. explicit managers • replace malloc() w. GC_malloc(), remove free() • no further adaption • used outdated versions (4.3 vs. 1.6/2.6)
Efficiency (2) • realistic alternative to explicit mem management(20% avg execution time overhead over best managers, up to 57% in worst case) • marks 3 MB/s on SparcStation II • up to 3 times heap usage for small heaps (fixed cost for collector’s internal structs) • needs substantially more space to avoid over-frequent GC • works best w. programs using very small objects • might co-exist poorly with cache management(heap blocks aligned on 4K boundaries)
Incremental/Generational Mode • marking in small steps interleaved with mutator • need to detect later changes to connectivity in traced parts of graph: • read dirty bits for pages • write-protect memory and catch faults • when mark stack is empty:trace from all marked objects on dirty heap blocks • reduces avg. pause times, increases total exec time • generational: GC uses knowledge which pages were recently modified
Mostly Copying Collection • Joel Bartlett, 1988 (Digital) • hybrid conservative / copying collector: • roots are treated conservative (don’t move referenced objects) • objects only accessible from heap-allocated objects are copied(assumes pointers in heap-allocated data can be found accurately) faster allocation less problems with pointer identification more accurate GC
Object layout • programmer has no control over object layout • what if object layout should match hardware registers or file structures? header size #pointers pointers user data non-pointers
Heap layout blocks with space identifiers root current_space = 1 next_space = 1 1 0 currently unused 1 42 currently unused
Allocation • within a block: • inc free-pointer • dec free-slots-count • if necessary: search for free block (space_id current_space/next_space) set its space_id to next_space • current_space = next_space during allocation
Collection • GC when heap is half full (half of heap blocks have space_id=current_space) • next_space = current_space +1 mod n • Fromspace = current_space blocks • Tospace = next_space blocks • scan roots conservatively for pointers into heap • move potentially referred objects to Tospace: • changing space_id of their blocks to next_space • add block to Tospace scan list • copy graphs accessible from blocks on scan list
Heap after Collection root current_space = 2 next_space = 2 2 2 1 42 currently unused currently unused
Bartlett‘s GC algorithm (1) gc() = next_space = (current_space + 1) mod 077777 Tospace_queue = empty for R in Roots promote(block(R)) while Tospace_queue != empty blk = pop(Tospace_queue) for obj in blk for S in Children(obj) S = copy(S) current_space = next_space
Bartlett‘s GC algorithm (2) promote (block) = if Heap_bottom block Heap_top and space(block) == current_space space(block) = next_space allocatedBlocks = allocatedBlocks + 1 push(block, Tospace_queue) copy (p) = if space(p) == next_space or p == nil return p if forwarded(p) return forwarding_address(p) np = move(p, free) free = free + size(p) forwarding_address(p) = np return np
Generational Mode (1) • One bit in space_id indicates young/old generation • Other bits approximate age of objects/blocks • Minor collection: • when 50% of free space after last GC is full • young objects reachable from roots/remembered set are promoted en masse (change space_id/copy) • remembered set: maintained via memory protection
Generational Mode (2) • Major collection (mark-compact): • when old generation occupies >85% of heap • mark accessible objects in old generation • pass 1: find old generation blocks <1/3 filledcopy objects to free space leaving forwarding addresses • pass 2: rescan old generation, correct pointers using forwarding addresses • expand heap if >75% full • maintaining remembered set costs time, but often saves more time during GC(20% time improvement on Scheme compiler)also reduces pause times in interactive programs
Efficiency (1) • no thorough studies • space overhead: space_ids, type info, block links, promotion bits 2% for 512 byte blocks; tagging data increases overhead • Mostly Copying vs. BDW:Mostly Copying probably better with many shortlived objects, benefit from faster allocation
Experiences • generational version: 20% runtime improvement for Scheme-to-C compiler • significant performance increase in CAD program (reduced paging) • bad results for non-generational collector for Modula-2 w. very large heaps (10s of Megabytes) • choose GC strategy that fits behaviour of mutator
The optimising Compiler/User Devil • conservative GC defeated by temporarily hidden pointers - parts of graph may be unreachable during a GC: • pointer arithmetic • adding tag bits • e.g. optimized array traversal: xend = x+SIZE; for(; x<xend; x++) ...*x...; x -= SIZE; ...x...; for (i=0; i<SIZE; i++) ...x[i]...; ...x...; inside loop x is interior pointer, afterwards x points one past the end
Machine-specific Optimizations struct l_thing { char thing[35000]; struct l_thing *next; } struct l_thing *; tail(struct l_thing *x) { return (x->next); } on IBM RISC System/6000, tail() translates to AIU r3=r3,1 ; r3+=65536 L r3=SHADOW(r3, -30536) ;= r3+35000 BA lr
Boehm and Chase’s Solution (1) • local root set of function f at any point in execution: • register/auto variables • previously computed values of direct sub-expressions of incompletely evaluated expressions:malloc‘s return value in malloc(size) + 4 • global root set: • declared static and extern variables • local root sets of all call sites in call chain • any values stored in other areas scanned by collector • valid base pointer: • pointer to anywhere inside an object or one past its end • BDW can handle such pointers
Boehm and Chase’s Solution (2) • every object on garbage collected heap must be accessible from global root set through chain of base pointersconservative collection safe with strictly ANSI-compatible programs • suggested implementation: • preprocess source using macros that prevent code generator from discarding live base pointers prematurely • compile normally • post-process assembly code, removing macro artifacts • transparent to programmer & compiler • may interfere with instruction scheduling • may increase register pressure
Ellis and Detlef’s solution • annotate operations on pointers with names of base pointers from which they’re derived • compiler treats these operations as uses of the original base pointers, extending their live ranges • code generation must respect live ranges • requires changes to compiler • does not alter sources • does not rely on behaviour of volatile declarations
GC for C++ • object-oriented languages often use more heap-allocated data • generate more complex data structures • GC uncouples memory management from class interfaces instead of dispersing it through code
Conservative GC for C++ • requires no changes to language • restriction on coding style holds: no hidden pointers (converted to int) • existing code may violate the restriction • aggressive optimisers may as well • safety must be enforced in code-generator • some support for finalization (GC_register_finalizer) - assuming few objects need finalization
Mostly Copying for C++ • storing all pointers at beginning of objects interferes with inheritance (fast field lookup) • here: user supplies callback methods to identify pointers class Tree { public: Tree* left; Tree* right; int data; Tree (int x); GCCLASS(Tree); ... }; GCPOINTERS(Tree) { gcpointer(left); gcpointer(right); } GCPOINTERS macro generates callback method Tree::GCPointers • currently no support for finalisation
Benefits of pointer locating methods • programmer may solve unsure reference problem:union { int n; thing *ptr;} x; • enables semantically accurate marking:e.g. stacks, queues • automatic GC retains uncleared references to removed elements • programmer can omit them even better than type-accurate GC
Using Object Descriptors • Detlefs, 1991: extension to Mostly Copying • insert descriptor into object headers • Bitmap format: • 1 word with 32 bits indicating pointer/non-pointer words • use if only first 32 words of user data contain pointers, can’t handle unsure references • Indirect format: • pointer to byte array encoding sure/unsure references and non-pointer values • array can be compressed using repeat counts • Fast indirect format: • array of ints; 1st number indicates repetitions of rest • subsequent numbers = number of words to skip to reach next pointer, negative number indicates unsure reference
Conclusion • GC effective for traditional imperative languages • realistic alternative to explicit mem management for most applications • not yet suitable for real-time / safety-critical applications • no big onstraints to coding style, except hidden pointer problem • gc’ing allocators competitive even with code not written for GC • GC should have hooks for client/programmer to communicate their knowledge: • explicit deallocation calls • atomic objects • hints of appropriate times to collect