1 / 50

Parallel and Concurrent Real-time Garbage Collection Part II: Memory Allocation and Sweeping

Parallel and Concurrent Real-time Garbage Collection Part II: Memory Allocation and Sweeping. David F. Bacon T.J. Watson Research Center. Page Data Synchronization, Take 1. 16. 64. 256. free. Page Data, Take 2. 16. 64. 256. Thread 1. 16. 64. 256. Thread 2. 16. 64. 256. free.

Download Presentation

Parallel and Concurrent Real-time Garbage Collection Part II: Memory Allocation and Sweeping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel and ConcurrentReal-time Garbage CollectionPart II: Memory Allocation and Sweeping David F. Bacon T.J. Watson Research Center

  2. Page Data Synchronization, Take 1 16 64 256 free

  3. Page Data, Take 2 16 64 256 Thread 1 16 64 256 Thread 2 16 64 256 free

  4. free 16 free 64 free 256 temp full full 256 16 64 ? free

  5. Compare-and-Swap* boolean CAS(int* addr, int old, int new) { atomic { if (* addr == old) { *addr = new; return true; } else { return false; } } } • IBM 370 since 1970, Intel 80486 since 1989 • Load-Reserved: ARM 6 since 1991, IBM POWER2 since 1993

  6. top Non-locking* Stack push(Page page) { do { Page t = top; page->next = t; } while (! CAS(&top, t, page); } Page pop() { Page t = null; do { t = top; if (t == null) return; Page n = t->next; } while (! CAS(&top, t, n); return t; }

  7. A B C A top top top top Thread 2 a=pop() b=pop() Thread 1 x=pop() start C B n n b A t a t 1 2 C C B n A Thread 1 pop() finish t B b Thread 2 push(A) 3 4

  8. Correct Non-locking Stack push(PageId page) { do { <PageId t, short v> = top; page->next = t; } while (! CAS(&top, <t,v>, <page,(v+1) % 0xffff>); } PageId pop() { PageId t = 0; do { <PageId t, short version> = top; if (t==0) return 0; } while (! CAS(&top, <t,v>, <next(t),(v+1) % 0xffff>); return t; }

  9. Really Correct Non-locking Stack push(Page page) { do { Page t = LOAD_AND_RESERVE(&top); page->next = t; } while (! STORE_CONDITIONAL(&top, page)); } Page pop() { Page t = null; do { Page t = LOAD_AND_RESERVE(&top); if (t==null) return null; } while (! STORE_CONDITIONAL(&top, t->next)); return t; }

  10. Coalescing ? free • Ideal: • getMultiplePages(n) returns best fit • Does not cause mutator activities to block • Does not cause collection activities to block • Does not suffer pathology under contention • Reality: tradeoffs between • Quality of fit • Probability of blocking • Overhead (number of CAS’s)

  11. Cartesian Tree + Try-lock ? 44 45 46 47 33 34 35 50 51 21 22 62 63 11 24 59 25 x.left.size < x.size >= x.right.size x.left.address < x.address < x.right.address

  12. Non-locking Rebalancing Tree • Homework… • …but I won’t use it. • Solution would be • Too complex to be reliable • Require too many CAS operations to be fast

  13. The Fix: Transactional Memory AtomicCartesianTree extends CartesianTree { atomic Block getBlock() { return super.getBlock(); } atomic void releaseBlock(Block b) { super.releaseBlock(b); } atomic Block[] getMultipleBlocks(int n) { return super.getMultipleBlocks(n); } }

  14. Hierarchical Non-locking Buddy? 21 22 24 25 11 33 34 35 50 51 44 45 46 47 59 62 63

  15. So much for allocation…Garbage Collection

  16. ? p Z T Y X W U r Y X a a a a a a a a b b b b b b b b Handling the Concurrency Application (“mutator”) Threads GC Threads • Trace • Collect • Format • Load pointer • Store pointer • Allocate

  17. Yuasa Snapshot Algorithm (1990) • Logically • Mark-and-Sweep Style Algorithm • Take a “copy-on-write” heap snapshot • Collect the garbage in that snapshot • Physically • Stop all threads • Copy their stacks (“roots”) • Force them to save over-written pointers • Trace roots and over-written pointers * Dijktstra’75, Steele’76

  18. U Y T W X Z a a a a a a b b b b b b 1: Take Logical Snapshot Stack p r

  19. r U Y T X W Z a a a a a a b b b b b b 2(a): Copy Over-written Pointers Stack p

  20. U T W X Z Y Z Y W X a a a a a a a a a a b b b b b b b b b b 2(b): Trace Stack p * Color is per-object mark bit

  21. s W Z U X Y T V X Z Y W a a a a a a a a a a a b b b b b b b b b b b 2(c): Allocate “Black” Stack p

  22. s W Z U X Y T V W Z Y X a a a a a a a a a a a b b b b b b b b b b b 3(a): Sweep Garbage free Stack p

  23. t s V2 T Y X Z U Y V Z X W a a a a a a a a a a a b b b b b b b b b b b 3(b): Allocate “White” free Stack p

  24. t s X V2 Z T U W Y V X Y W Z V a a a a a a a a a a a a a b b b b b b b b b b b b b 4: Clear Marks free Stack p

  25. Yuasa Algorithm Phases • Snapshot stack and global roots • Trace • Flip • Sweep • Flip • Clear Marks * Synchronous

  26. Metronome-2 Concurrency GC Master Thread GC Worker Threads Application Threads (may do GC work)

  27. Metronome-2 Phases • Clearing • Monitor Table clearing • JNI Weak Global clearing • Debugger Reference clearing • JVMTI Table clearing • Phantom Reference clearing • Re-Trace 2 • Trace Master • (Trace*) • (Trace Terminate***) • Class Unloading • Completion • Finalizer Wakeup • Class Unloading Flush • Clearable Compaction** • Book-keeping • Initiation • Setup • turn double barrier on • Root Scan • Active Finalizer scan • Class scan • Thread scan** • switch to single barrier, color to black • Debugger, JNI, Class Loader scan • Trace • Trace* • Trace Terminate*** • Re-materialization 1 • Weak/Soft/Phantom Reference List Transfer • Weak Reference clearing** (snapshot) • Re-Trace 1 • Trace Master • (Trace*) • (Trace Terminate***) • Re-materialization 2 • Finalizable Processing • Flip • Move Available Lists to Full List* (contention) • turn write barrier off • Flush Per-thread Allocation Pages** • switch allocation color to white • switch to temp full list • Sweeping • Sweep* • Switch to regular Full List** • Move Temp Full List to regular Full List* (contention) * Parallel ** Callback *** Single actor symmetric

  28. Part 3: Sweep • Clearing • Monitor Table clearing • JNI Weak Global clearing • Debugger Reference clearing • JVMTI Table clearing • Phantom Reference clearing • Re-Trace 2 • Trace Master • (Trace*) • (Trace Terminate***) • Class Unloading • Completion • Finalizer Wakeup • Class Unloading Flush • Clearable Compaction** • Book-keeping • Initiation • Setup • turn double barrier on • Root Scan • Active Finalizer scan • Class scan • Thread scan** • switch to single barrier, color to black • Debugger, JNI, Class Loader scan • Trace • Trace* • Trace Terminate*** • Re-materialization 1 • Weak/Soft/Phantom Reference List Transfer • Weak Reference clearing** (snapshot) • Re-Trace 1 • Trace Master • (Trace*) • (Trace Terminate***) • Re-materialization 2 • Finalizable Processing • Flip • Move Available Lists to Full List* (contention) • turn write barrier off • Flush Per-thread Allocation Pages** • switch allocation color to white • switch to temp full list • Sweeping • Sweep* • Switch to regular Full List** • Move Temp Full List to regular Full List* (contention) * Parallel ** Callback *** Single actor symmetric

  29. s W Z U X Y T V X Z Y W a a a a a a a a a a a b b b b b b b b b b b Assume Trace Has Finished Stack p

  30. thread list pthread pthread_t writebuf 43 color BarrierOn BarrierOn false true pthread pthread_t writebuf 41 color epoch epoch 43 37 JVM Application Thread Structure 16 64 256 sweep  dblb  Thread 1 next inVM  16 64 256 sweep  dblb  Thread 2 next inVM 

  31. “Move Available” Phase

  32. free 16 free 64 free 256 temp full full 256 16 64 available ? free

  33. free 16 free 64 free 256 temp full full 256 16 64 contention? available ? free

  34. Generic Structure of Move Phase:Shared Monotonic Work Pool GC Master Thread GC Worker Threads Application Threads (may do GC work) pool of work

  35. Performing a GC Work Quantum every (500us) tryToDoSomeWorkForGC(); tryToDoSomeWorkForGC() { short phase = enterPhase(); if (phase > 0) doQuantum(phase); leavePhase(); }

  36. PhaseInfo phase workers moveavail 3 Phase Entry short enterPhase() { thread->epoch = Epoch.newest; while (true) <short phase, short workers> = PhaseInfo; if (workers == phaseTable[phase].maxWorkers) return -1; if (CAS(PhaseInfo,<phase,workers>,<phase,workers+1>)) return phase; } ABA?

  37. PhaseInfo phase workers moveavail 3 Phase Exit short leavePhase() { while (true) <short phase, short workers> = PhaseInfo; if (workers > 1 || moreWorkForPhase(phase)) newPhase = <phase,workers-1>; else if (phase == TRACE_PHASE) newPhase = <phaseTable[phase].nextPhase, 1>; else newPhase = <phaseTable[phase].nextPhase, 0>; if (CAS(PhaseInfo, <phase,workers>, newPhase)) return newPhase; }

  38. “Per-Thread Flush” Phase

  39. thread list pthread pthread_t PhaseInfo phase workers writebuf flush 1 43 color color BarrierOn BarrierOn false true pthread pthread_t writebuf 41 color color epoch epoch 37 43 Flush Per-Thread Pages 16 64 256 sweep  sweep  dblb  Thread 1 next inVM  16 64 256 sweep  sweep  dblb  Thread 2 next inVM 

  40. free 16 free 64 free 256 temp full full 256 16 64 Put Full White Pages on temp-full List available ? free

  41.   How is this Phase Different? Per-Thread State Update GC Master Thread GC Worker Threads Application Threads (may do GC work) pool of work

  42. Callback Mechanism bool doCallbacks(int callbackId) { bool alldone = true; LOCK(threadlist); for each (Thread thread in threadlist) if (hasCallbackWork(thread, callbackId); if (! thread.inVM && TRY_LOCK(thread)) doCallbackWorkForThread(thread, callbackId); UNLOCK(thread); else postCallback(thread, callbackId); alldone = false; UNLOCK(threadlist); return alldone; } * Non-locking list with cursor?

  43. “Sweep” Phase

  44. free 16 free 64 free 256 temp full full 64 256 16 Sweep Phase available ? free

  45. “Switch Full List” Phase

  46. thread list pthread pthread_t PhaseInfo phase workers writebuf switchfull 1 43 color color BarrierOn BarrierOn false true pthread pthread_t writebuf 41 color epoch epoch 43 37 Switch Back to Normal Full List 16 64 256 color sweep  sweep  dblb  Thread 1 next inVM  16 64 256 sweep  sweep  dblb  Thread 2 next inVM 

  47. “Drain Temp Full” Phase

  48. free 16 free 64 free 256 temp full full 64 256 16 Drain Temp Full available ? free

  49. Done with Collection!! • But… • That Was the Easy Part • And there are still some problems • What if threads go out to lunch?

  50. http://www.research.ibm.com/metronomehttps://sourceforge.net/projects/tuningforkvphttp://www.research.ibm.com/metronomehttps://sourceforge.net/projects/tuningforkvp

More Related