1 / 22

McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator

McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator. Ali Adl-Tabatabai Ben Hertzberg Rick Hudson Bratin Saha. Goals of McRT-Malloc. Scalable Performance linear to # of processors then flat as you add more SW threads Preemption safety

rowdy
Download Presentation

McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator Ali Adl-Tabatabai Ben Hertzberg Rick Hudson Bratin Saha

  2. Goals of McRT-Malloc • Scalable • Performance linear to # of processors then flat as you add more SW threads • Preemption safety • Implies a lock free approach to all structures • Allows other scalable McRT algorithms to use malloc and remain scalable • Transactional memory awareness • Avoid memory blowup within transaction • Avoid freeing of bits needed to validate other transactions • Enable a object level conflict detection in STM • Best of class

  3. Heap divided into aligned 16K blocks 18 significant bits Block Owned by a single thread during allocation Blocks segregated into bins according to objects size Meta data header Free Lists Bump Pointer Next/Previous Block Object size and usage info No per object Headers Free blocks on non-blocking LIFO queue 46 bit for update timestamp Block Data Structure 0xABCD0000 Meta data Header 0xABCD0040 Object Pointer . . . 0xABCD4000

  4. Object Allocation and Freeing • Thread owns block they allocate in • Trick - Free uses two linked free lists per block • Private free list for block owner avoids atomic instructions • Public list for other threads use atomic instruction and non-blocking algorithm • Trick - Fresh block uses frontier pointer to avoid free list initialization • Then allocates from private free list • Privatize entire public list as needed with atomic xchg

  5. McRT-Malloc: A Transaction Aware Memory Allocator • Three problems • Speculative memory allocation and de-allocation inside transactions can cause space blowup • Transactional conflict detection and frees • Object-based conflict detection in C/C++ • Garbage collection also solves these issues

  6. Allocation with STM • Speculatively allocate or free inside transaction • Valid at commit - rolled back on abort • Balanced – both malloc and free within transaction • Memory is transaction-local must be reused to prevent memory blowup transaction { for (i=0; i<big_number; i++) { foo = malloc(size); … free(foo); } }

  7. Solution • Use sequence numbers to track allocation relationships • Sequence counter per-thread (thread-local) • Every transaction (even nested) takes a new (incremented) sequence number upon start • Every allocation in the transaction is tagged with its sequence number • The relationship of an object being freed in a given transaction is determined by sequence number: • seq(object) < seq(transaction) → speculative free • seq(object) == seq(transaction) → balanced free

  8. Monitors != Transactions • STM uses bits in object to validate at commit • Pessimistically monitors (locks) allow only one thread inside a critical section • Optimistically transactions allow multiple threads inside a critical section • This causes problems freeing an object

  9. nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } Thread 2 Deleting node 3 Thread 1 Deleting node 2

  10. nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ }

  11. nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } At this point you have read / read (non) conflict

  12. nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } Now we have a read / write conflict Thread 1 commits and thread two will abort

  13. nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate & end transaction */ free(temp); /* Anyone using? */ } nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate & end transaction */ free(temp); /* Anyone using? */ } STM Version information needed for validation is destroyed along with object 2

  14. nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } Thread two wakes up

  15. nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } The bits thread 2 are relying on to detect and resolve conflict by aborting are now garbage

  16. Solution • Delay the actual free and reuse until in a consistent state • A global epoch (timestamp) is maintained and incremented periodically • Each thread locally remembers the global epoch of the last time it entered or exited a top level transaction • Set as part of TransactionBegin and TransactionAbort/Commit • Each free and global epoch noted in a thread local buffer • When the buffer fills each thread’s epoch is queried • All frees before the minimum epoch are freed “for real” • O(number of frees) not O(number of memory accesses)

  17. McRT-Malloc Beats Hoard Machias Benchmark Mimics the consumer producer pattern with minimal work load (Normalized so X axis indicates linear scaling)

  18. McRT STM Malloc Running Machias

  19. McRT STM vs. McRT Malloc Running Machias

  20. McRT STM vs. McRT Malloc Memory UsageRunning Machias

  21. Conclusion • Best of class scalable malloc implementation • Non-blocking to enable other McRT algorithms to be non-blocking and still use malloc • Solved memory blowup within a transaction • Solved premature freeing problem for STM with optimistic concurrency • Enabled object granularity conflict detection in C

  22. Questions

More Related