230 likes | 350 Views
Chromed Metal. Safe and Fast C++ Andrei Alexandrescu andrei@metalanguage.com. Agenda. Modularity and speed: a fundamental tension Example: memory allocation Policies Eager Computation Segregate functionality Costless refinements
E N D
Chromed Metal Safe and Fast C++ Andrei Alexandrescu andrei@metalanguage.com
Agenda • Modularity and speed: a fundamental tension • Example: memory allocation • Policies • Eager Computation • Segregate functionality • Costless refinements • Based on “Composing High-Performance Memory Allocators” by Berger et al: www.heaplayers.org
Modularity: good • Developing systems from small parts is good • Best known way to manage complexity • Abstraction is good • Modularity and abstraction go hand in hand • Separate development is good • Separate testing is good • Confinement of bugs is good
Speed: good • Getting work done is good (?) • Libraries that don’t exact penalties are good • Lossless growth is good • Compounded inefficiency: abstraction’s worst enemy
Modularity and Speed • Fundamental tension: • Modularity asks for separation, hiding, abstraction, and uniform interfaces • Speed asks for coalescing, transparency, specialization, and non-uniformity • How to resolve the tension?
Two Approaches • Defer compilation/optimization • Develop subsystems separately, have the runtime optimize when it sees them all • Various JIT approaches • Expedite computation/exposure • Develop subsystems separately, have the compiler see them all early • Various macro and compilation systems
Example: Memory Allocation • Memory allocation: • Very hard to modularize/componentize • Highly competitive: • General-purpose allocators: 100 cycles/alloc • Specialized allocators: < 12 cycles/alloc • Templates: • Compute things early • Expose modular code early
Idea #1: mixins/policies • Create uncommitted, “for adoption” derived classes template <class Base> struct Heap : public Base { void* Alloc(size_t); void Dealloc(void*); }; • Exposes modular code early
Top Class • Can’t defer forever, so without further ado… struct MallocHeap { void* Alloc(size_t s) { return malloc(s); } void Dealloc(void* p) { return free(p); };
Idea #2: Eager Computation • Avoid redundant and runtime computation safely! class TopHeap { void* Alloc(size_t) { ... } void Dealloc(void*) { ... } friend void* Alloc(Heap & h, size_t s) { return h.AllocImpl( (s + AlignBytes - 1) & ~(AlignBytes - 1))); } friend void Dealloc(Heap & h, void* p) { return h.Dealloc(p); } };
Idea #3: Segregate Representation template <class Base> class SzHeap : public Base { void* Alloc(size_t s) { size_t * pS = static_cast<size_t*>( Base::AllocImpl(s + sizeof(size_t))); return *pS = s, pS + 1; } void Dealloc(void* p) { Base::Dealloc(static_cast<size_t*>(p) – 1); } size_t SizeOf(void* p) { return (static_cast<size_t*>(p))[-1]; } };
Free Lists • Unbeatable specialized allocation method • Put deallocated blocks in a freelist • Consult the freelist when allocating • Disadvantage: fixed size, no coallescing, no reallocation
Free Lists Layer template <size_t S, class Base> class FLHeap : public Base { void* Alloc(size_t s) { if (s != S || !list_) { return Base::AllocImpl(s); } void * p = list_; list_ = list_->next_; return p; } ...
(continued) ... void Dealloc(void * p) { if (SizeOf(p) != S) return Base::Dealloc(p); list * pL = static_cast<List*>(p); pL->next_ = list_; list_= pL; } ~FLHeap() { ... } private: struct List { List * next_; } };
Remarks • There is no source-level coupling between the way the size is maintained and computed, and FLHeap • Combinatorial advantage • There is coupling at the object code level • + Optimization • - Separate linking, dynamic loading…
Building a Layered Allocator typedef FLHeap<64, FLHeap<32, SzHeap<MallocHeap> > > MyHeap; • Modular • Easy to understand • Easy to change • Efficient
Idea #4: Costless Refinements template <class Heap> struct CanResize { enum { value = 0 }; }; template <class Heap> bool Resize(Heap &, void*, size_t &) { return 0; } • Refined implementations will “hide” the default and specialize CanResize • Can test for resizing capability at compile tim or runtime
Range Allocators template <size_t S1, size_t S2, class Base> class RHeap : public Base { void* Alloc(size_t s) { static_assert(S1 < S2); if (s >= S1 && s < S2) s = S2; return Base::AllocImpl(s); } ... }; • Improved speed at the cost of slack memory • User-controlled tradeoff
Idea #2 again: Eager computation template <size_t S1, size_t S2, size_t S3, class B> void* RHeap<S1, S2, RHeap<S2, S3, B> >:: Alloc(size_t s) { static_assert(S1 < S2 && S2 < S3); if (s >= S1 && s < S3) { s = s < S2 ? S2 : S3; } return Base::AllocImpl(s); } ... };
Further Building Blocks • Profiling and debug heaps • MT heaps • Locked • Lock-free • Region-based • Alloc bumps a pointer • Dealloc doesn’t do a thing • Destructor deallocates everything
Performance • 1%-8% speed improvement over gcc’s ObStack • 2%-3% speed loss over the Kingsley allocator • 2% faster – 20% slower than Lea’s allocator • Lea: monolithic general-purpose allocator • Optimized for 7 years • Memory consumption similar within 5%
Conclusions • Modularity and efficiency are at odds • Templates offer black-box source, white-box compilation • A few idioms for efficient, safe idioms: • Policies • Eager Computation • Segregate functionality • Costless refinements
Bibliography • Emery Berger et al., “Composing High-Performance Memory Allocators”, PLDI 2001 • Yours Truly and Emery Berger, “Policy-Based Memory Allocation”, CUJ Dec 2005