200 likes | 334 Views
Wait-Free Reference Counting and Memory Management. Håkan Sundell , Ph.D. Outline. Shared Memory Synchronization Methods Memory Management Garbage Collection Reference Counting Memory Allocation Performance Conclusions. Shared Memory. CPU. CPU. CPU. Cache. Cache. Cache. Memory.
E N D
Wait-Free Reference Counting and Memory Management Håkan Sundell , Ph.D.
Outline • Shared Memory • Synchronization Methods • Memory Management • Garbage Collection • Reference Counting • Memory Allocation • Performance • Conclusions IPDPS 2005
Shared Memory CPU CPU . . . CPU Cache Cache Cache Memory - Uniform Memory Access (UMA) ... ... ... CPU CPU CPU CPU CPU CPU . . . Cache bus Cache bus Cache bus Memory Memory Memory - Non-Uniform Memory Access (NUMA) IPDPS 2005
Synchronization • Shared data structures needs synchronization! • Accesses and updates must be coordinated to establish consistency. P1 P2 P3 IPDPS 2005
Hardware Synchronization Primitives • Weak • Atomic Read/Write • Stronger • Atomic Test-And-Set (TAS), Fetch-And-Add (FAA), Swap • Universal • Atomic Compare-And-Swap (CAS) • Atomic Load-Linked/Store-Conditionally Read Read Write M=f(M,…) IPDPS 2005
Mutual Exclusion • Access to shared data will be atomic because of lock • Reduced Parallelism by definition • Blocking, Danger of priority inversion and deadlocks. • Solutions exists, but with high overhead, especially for multi-processor systems P1 P2 P3 IPDPS 2005
Non-blocking Synchronization • Perform operation/changes using atomic primitives • Lock-Free Synchronization • Optimistic approach • Retries until succeeding • Wait-Free Synchronization • Always finishes in a finite number of its own steps • Coordination with all participants IPDPS 2005
Memory Management • Dynamic data structures need dynamic memory management • Concurrent D.S. need concurrent M.M.! IPDPS 2005
Concurrent Memory Management • Concurrent Memory Allocation • i.e. malloc/free functionality • Concurrent Garbage Collection • Questions (among many): • When to re-use memory? • How to de-reference pointers safely? P2 P1 P3 IPDPS 2005
Lock-Free Memory Management • Memory Allocation • Valois 1995, fixed block-size, fixed purpose • Michael 2004, Gidenstam et al. 2004, any size, any purpose • Garbage Collection • Valois 1995, Detlefs et al. 2001; reference counting • Michael 2002, Herlihy et al. 2002; hazard pointers IPDPS 2005
Wait-Free Memory Management • Hesselink and Groote, ”Wait-free concurrent memory management by create and read until deletion (CaRuD)”, Dist. Comp. 2001 • limited to the problem of shared static terms • New Wait-Free Algorithm: • Memory Allocation – fixed block-size, fixed purpose • Garbage Collection – reference counting IPDPS 2005
Wait-Free Reference Counting • De-referencing links • 1. Read the link contents, i.e. a pointer. • 2. Increment (FAA) the reference count on the corresponding object. • What if the link is changed between step 1 and 2? • Wait-Free solution: • The de-referencing operation should announce the link before reading. • The operations that changes that link should help the de-referencing operation. IPDPS 2005
Wait-Free Reference Counting • Announcing • Writes the link adress to a (per thread and per new de-ref) shared variable. • Atomically removes the announcement and retrieves possible answer (from helping) by Swap with null. • Helping • If announcement matches changed link, atomically answer with a proper pointer using CAS. IPDPS 2005
Wait-Free Memory Allocation • Solution (lock-free), IBM freelists: • Create a linked-list of the free nodes, allocate/reclaim using CAS • How to guarantee that the CAS of a alloc/free operation eventually succeeds? Allocate … Head Mem 1 Mem 2 … Mem i Reclaim Used 1 IPDPS 2005
Wait-Free Memory Allocation • Wait-Free Solution: • Create 2*N freelists. • Alloc operations concurrently try to allocate from the current (globally agreed on) freelist. • When current freelist is empty, the current is changed in round-robin manner. • Free operation of thread i only works on freelist i or N+i. • Alloc operations announce their interest. • All free and alloc operations try to help announced alloc operations in round-robin. IPDPS 2005
Wait-Free Memory Allocation CAS! SWAP! X X • Announcing • A value of null in the per thread shared variable indicates interest. • Alloc atomically announces and recieves possible answer by using Swap. … Announcement variables Null X Null X Null Null id • Helping • Globally agreed on which thread to help, incremented when agreed in round-robin. • Free atomically answers the selected thread of interest with a free node using CAS. • First time that Alloc succeeds with getting a node from the current freelist, it tries to atomically answer the selected thread of interest with the node using CAS. IPDPS 2005
Performance • Worst-case • Need analysis of maximum execution path and apply known WCET techniques. • e.g. 2*N2 maximum CAS retries for alloc. • Average and Overhead • Experiments in the scope of dynamic data structures (e.g. lock-free skip list) • H. Sundell and P. Tsigas, ”Fast and Lock-Free Concurrent Priority Queues for Multi-thread Systems”, IPDPS 2003 • Performed on NUMA (SGI Origin 2000) architecture, full concurrency. IPDPS 2005
Average Performance IPDPS 2005
Conclusions • New algorithms for concurrent & dynamic Memory Management • Wait-Free & Linearizable. • Reference counting. • Fixed-size memory allocation. • To the best of knowledge, the first wait-free memory management scheme that supports implementing arbitrary dynamic concurrent data structures. • Will be available as part of NOBLE software library, http://www.noble-library.org • Future work • Implement new wait-free dynamic data structures. • Provide upper bounds of memory usage. IPDPS 2005
Questions? • Contact Information: • Address: Håkan Sundell Computing Science Chalmers University of Technology • Email: phs@cs.chalmers.se • Web: http://www.cs.chalmers.se/~phs IPDPS 2005