310 likes | 521 Views
Carnegie Trust for the Universities of Scotland. Efficient Dynamic Heap Allocation of Scratch-Pad Memory. Ross McIlroy, Peter Dickman and Joe Sventek. Scratch-Pad Memory Allocator. SMA: A dynamic memory allocator targeting extremely small memories (< 1MB in size)
E N D
Carnegie Trust for the Universities of Scotland Efficient Dynamic Heap Allocationof Scratch-Pad Memory Ross McIlroy, Peter Dickman and Joe Sventek
Scratch-Pad Memory Allocator SMA: A dynamic memory allocator targeting extremely small memories (< 1MB in size) • Why target such tiny memories? • Why provide dynamic memory allocation for such small memories?
Outline • Rational for SMA • SMA Approach • Results • Concurrent SMA • Conclusion / Future work
Outline • Rational for SMA • SMA Approach • Results • Concurrent SMA • Conclusion / Future work
What Tiny Memories? • Embedded Systems • Sensor Network Motes • Vehicular Devices • Scratch-Pad Memories • Network Processors • Heterogeneous Multi-Core Processors
Scratch-Pad Memories • Memory structured as a hierarchy • Small fast memories, large slow memories • Usually hidden by hardware caches • Some processor architectures employ scratch-pad memories instead • Similar size and speed as caches, but explicitlyaccessible by software • Examples • IBM Cell processor • Intel IXP network processors • Intel PXA mobile phone processors
Why Dynamic Management? • Developers want as much useful data in the fast Scratch-Pad memory as possible • They don’t want to deal with the fragmented memory hierarchy
Why SMA? Managing 4kB Scratch-Pad memory on an Intel IXP processor
Outline • Rational for SMA • SMA Approach • Results • Concurrent SMA • Conclusion / Future work
Basic Approach • By default represent memory coarsely as a series of fixed size blocks • Can employ a very simple bitmap based allocation / free algorithm • When required, split blocks into variable sized regions • Prevents excessive internal fragmentation
Large Block Allocation • Each block in memory represented by a bit in a free-block bitmap 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 1 0 1 rem_blocks = blocks_bm & ~mask; next_pos = ffs(rem_blocks); in_use = mask & ~blocks_bm; next pos = fls(in_use) + 1;
Small Region Allocation • Unused parts of an allocated block can be reused by sub-block sized allocations • Blocks are split into power of two sized regions, in a Binary Buddy type approach • Free regions are stored in per-size free lists
Coalescing Freed Regions • We wanted to avoid boundary tags • Instead the orderly way in which regions are split is exploited • A word sized coalesce tag stores the coalesce details for all regions in a block 1
Deferred Coalescing • SMA (CAM) • Any size can have coalescing deferred • Content addressable memory used to associate thesize of deferred coalesced regions with the regionsthemselves • SMA (LM) • Sizes which coalescing can be deferred chosen atcompile time • Deferred regions stored in an array in local memory
Outline • Rational for SMA • SMA Approach • Results • Concurrent SMA • Conclusion / Future work
Experimental Setup • Intel IXP 2350 • Network processor • 4 microengine cores with 4kB local scratch-pad each • Access to another 16kB of shared scratch-pad • Compared against Doug Lea’s malloc
Outline • Rational for SMA • SMA Approach • Results • Concurrent SMA • Conclusion / Future work
Lock-Free Block Allocation • State for large blocks is stored in the free-block bitmap • A simple lock-free update algorithm can be used to protect this bitmap • Uses the test and clear primitive 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 0 0 0 Global Test & Clear Test & Clear Atomic Set 0 0 Thread 1 Thread 2
Protecting Small Region Lists • Locks are used to protect the free-lists used for small size allocation • SMA Coarse uses one lock • SMA Fine uses one lock per size class • In SMA Fine, when regions are being coalesced, two locks must be held briefly
Outline • Rational for SMA • SMA Approach • Results • Concurrent SMA • Conclusion / Future work
Future Work • Provide the illusion of a single memory • Let runtime worry about data placement • Data can be annotated to give hints to the runtime system
Conclusion • Tiny memories need to be managed too • SMA is a simple and efficient algorithm for dynamic management of small memories • Fixed size block allocation is simple and has low state overheads • Splitting partially used blocks to be reused by small allocations limits fragmentation • SMA can be augmented to support concurrent requests from multiple cores