1 / 23

Chromed Metal

Chromed Metal. Safe and Fast C++ Andrei Alexandrescu andrei@metalanguage.com. Agenda. Modularity and speed: a fundamental tension Example: memory allocation Policies Eager Computation Segregate functionality Costless refinements

pascha
Download Presentation

Chromed Metal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chromed Metal Safe and Fast C++ Andrei Alexandrescu andrei@metalanguage.com

  2. Agenda • Modularity and speed: a fundamental tension • Example: memory allocation • Policies • Eager Computation • Segregate functionality • Costless refinements • Based on “Composing High-Performance Memory Allocators” by Berger et al: www.heaplayers.org

  3. Modularity: good • Developing systems from small parts is good • Best known way to manage complexity • Abstraction is good • Modularity and abstraction go hand in hand • Separate development is good • Separate testing is good • Confinement of bugs is good

  4. Speed: good • Getting work done is good (?) • Libraries that don’t exact penalties are good • Lossless growth is good • Compounded inefficiency: abstraction’s worst enemy

  5. Modularity and Speed • Fundamental tension: • Modularity asks for separation, hiding, abstraction, and uniform interfaces • Speed asks for coalescing, transparency, specialization, and non-uniformity • How to resolve the tension?

  6. Two Approaches • Defer compilation/optimization • Develop subsystems separately, have the runtime optimize when it sees them all • Various JIT approaches • Expedite computation/exposure • Develop subsystems separately, have the compiler see them all early • Various macro and compilation systems

  7. Example: Memory Allocation • Memory allocation: • Very hard to modularize/componentize • Highly competitive: • General-purpose allocators: 100 cycles/alloc • Specialized allocators: < 12 cycles/alloc • Templates: • Compute things early • Expose modular code early

  8. Idea #1: mixins/policies • Create uncommitted, “for adoption” derived classes template <class Base> struct Heap : public Base { void* Alloc(size_t); void Dealloc(void*); }; • Exposes modular code early

  9. Top Class • Can’t defer forever, so without further ado… struct MallocHeap { void* Alloc(size_t s) { return malloc(s); } void Dealloc(void* p) { return free(p); };

  10. Idea #2: Eager Computation • Avoid redundant and runtime computation safely! class TopHeap { void* Alloc(size_t) { ... } void Dealloc(void*) { ... } friend void* Alloc(Heap & h, size_t s) { return h.AllocImpl( (s + AlignBytes - 1) & ~(AlignBytes - 1))); } friend void Dealloc(Heap & h, void* p) { return h.Dealloc(p); } };

  11. Idea #3: Segregate Representation template <class Base> class SzHeap : public Base { void* Alloc(size_t s) { size_t * pS = static_cast<size_t*>( Base::AllocImpl(s + sizeof(size_t))); return *pS = s, pS + 1; } void Dealloc(void* p) { Base::Dealloc(static_cast<size_t*>(p) – 1); } size_t SizeOf(void* p) { return (static_cast<size_t*>(p))[-1]; } };

  12. Free Lists • Unbeatable specialized allocation method • Put deallocated blocks in a freelist • Consult the freelist when allocating • Disadvantage: fixed size, no coallescing, no reallocation

  13. Free Lists Layer template <size_t S, class Base> class FLHeap : public Base { void* Alloc(size_t s) { if (s != S || !list_) { return Base::AllocImpl(s); } void * p = list_; list_ = list_->next_; return p; } ...

  14. (continued) ... void Dealloc(void * p) { if (SizeOf(p) != S) return Base::Dealloc(p); list * pL = static_cast<List*>(p); pL->next_ = list_; list_= pL; } ~FLHeap() { ... } private: struct List { List * next_; } };

  15. Remarks • There is no source-level coupling between the way the size is maintained and computed, and FLHeap • Combinatorial advantage • There is coupling at the object code level • + Optimization • - Separate linking, dynamic loading…

  16. Building a Layered Allocator typedef FLHeap<64, FLHeap<32, SzHeap<MallocHeap> > > MyHeap; • Modular • Easy to understand • Easy to change • Efficient

  17. Idea #4: Costless Refinements template <class Heap> struct CanResize { enum { value = 0 }; }; template <class Heap> bool Resize(Heap &, void*, size_t &) { return 0; } • Refined implementations will “hide” the default and specialize CanResize • Can test for resizing capability at compile tim or runtime

  18. Range Allocators template <size_t S1, size_t S2, class Base> class RHeap : public Base { void* Alloc(size_t s) { static_assert(S1 < S2); if (s >= S1 && s < S2) s = S2; return Base::AllocImpl(s); } ... }; • Improved speed at the cost of slack memory • User-controlled tradeoff

  19. Idea #2 again: Eager computation template <size_t S1, size_t S2, size_t S3, class B> void* RHeap<S1, S2, RHeap<S2, S3, B> >:: Alloc(size_t s) { static_assert(S1 < S2 && S2 < S3); if (s >= S1 && s < S3) { s = s < S2 ? S2 : S3; } return Base::AllocImpl(s); } ... };

  20. Further Building Blocks • Profiling and debug heaps • MT heaps • Locked • Lock-free • Region-based • Alloc bumps a pointer • Dealloc doesn’t do a thing • Destructor deallocates everything

  21. Performance • 1%-8% speed improvement over gcc’s ObStack • 2%-3% speed loss over the Kingsley allocator • 2% faster – 20% slower than Lea’s allocator • Lea: monolithic general-purpose allocator • Optimized for 7 years • Memory consumption similar within 5%

  22. Conclusions • Modularity and efficiency are at odds • Templates offer black-box source, white-box compilation • A few idioms for efficient, safe idioms: • Policies • Eager Computation • Segregate functionality • Costless refinements

  23. Bibliography • Emery Berger et al., “Composing High-Performance Memory Allocators”, PLDI 2001 • Yours Truly and Emery Berger, “Policy-Based Memory Allocation”, CUJ Dec 2005

More Related