1 / 78

Memory Management for High-Performance Applications

Memory Management for High-Performance Applications. Emery Berger. Advisor: Kathryn S. McKinley. Department of Computer Sciences. High-Performance Applications. Web servers, search engines, scientific codes C or C++ Run on one or cluster of server boxes. software. compiler.

Thomas
Download Presentation

Memory Management for High-Performance Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory Management forHigh-Performance Applications Emery Berger Advisor: Kathryn S. McKinley Department of Computer Sciences Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  2. High-Performance Applications • Web servers, search engines, scientific codes • C or C++ • Run on one or cluster of server boxes software compiler • Needs support at every level runtime system operating system hardware Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  3. New Applications,Old Memory Managers • Applications and hardware have changed • Multiprocessors now commonplace • Object-oriented, multithreaded • Increased pressure on memory manager(malloc, free) • But memory managers have not changed • Inadequate support for modern applications Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  4. Current Memory ManagersLimit Scalability • As we add processors, program slows down • Caused by heap contention Larson server benchmark on 14-processor Sun Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  5. The Problem • Current memory managersinadequate for high-performance applications on modern architectures • Limit scalability, application performance, and robustness Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  6. Contributions • Building memory managers • Heap Layers framework • Problems with current memory managers • Contention, false sharing, space • Solution: provably scalable memory manager • Hoard • Extended memory manager for servers • Reap Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  7. Implementing Memory Managers • Memory managers must be • Space efficient • Very fast • Heavily-optimized code • Hand-unrolled loops • Macros • Monolithic functions • Hard to write, reuse, or extend Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  8. Building Modular Memory Managers • Classes • Rigid hierarchy • Overhead • Mixins • Flexible hierarchy • No overhead Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  9. A Heap Layer • Mixin with malloc & free methods • template <class SuperHeap>class GreenHeapLayer : • public SuperHeap {…}; Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  10. Example:Thread-Safe Heap Layer LockedHeap protect the superheap with a lock LockedMallocHeap Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  11. Empirical Results • Heap Layers vs. originals: • KingsleyHeapvs. BSD allocator • LeaHeapvs. DLmalloc 2.7 • Competitive runtime and memory efficiency Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  12. Overview • Building memory managers • Heap Layers framework • Problems with memory managers • Contention, space, false sharing • Solution: provably scalable allocator • Hoard • Extended memory manager for servers • Reap Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  13. Problems with General-Purpose Memory Managers • Previous work for multiprocessors • Concurrent single heap [Bigler et al. 85, Johnson 91, Iyengar 92] • Impractical • Multiple heaps [Larson 98, Gloger 99] • Reduce contention but cause other problems: • P-fold or even unbounded increase in space • Allocator-induced false sharing we show Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  14. Multiple Heap Allocator:Pure Private Heaps • One heap per processor: • malloc gets memoryfrom its local heap • free puts memoryon its local heap • STL, Cilk, ad hoc Key: = in use, processor 0 = free, on heap 1 processor 0 processor 1 x1= malloc(1) x2= malloc(1) free(x1) free(x2) x4= malloc(1) x3= malloc(1) free(x3) free(x4) Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  15. Problem:Unbounded Memory Consumption • Producer-consumer: • Processor 0 allocates • Processor 1 frees • Unbounded memory consumption • Crash! processor 0 processor 1 x1= malloc(1) free(x1) x2= malloc(1) free(x2) x3= malloc(1) free(x3) Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  16. Multiple Heap Allocator:Private Heaps with Ownership • free returns memory to original heap • Bounded memory consumption • No crash! • “Ptmalloc” (Linux),LKmalloc processor 0 processor 1 x1= malloc(1) free(x1) x2= malloc(1) free(x2) Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  17. Problem:P-fold Memory Blowup • Occurs in practice • Round-robin producer-consumer • processor i mod P allocates • processor (i+1) mod P frees • Footprint = 1 (2GB),but space = 3 (6GB) • Exceeds 32-bit address space: Crash! processor 0 processor 1 processor 2 x1= malloc(1) free(x1) x2= malloc(1) free(x2) x3=malloc(1) free(x3) Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  18. Problem:Allocator-Induced False Sharing • False sharing • Non-shared objectson same cache line • Bane of parallel applications • Extensively studied • All these allocatorscause false sharing! cache line processor 0 processor 1 x1= malloc(1) x2= malloc(1) thrash… thrash… Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  19. So What Do We Do Now? • Where do we put free memory? • on central heap: • on our own heap:(pure private heaps) • on the original heap:(private heaps with ownership) • How do we avoid false sharing? • Heap contention • Unbounded memory consumption • P-fold blowup Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  20. Overview • Building memory managers • Heap Layers framework • Problems with memory managers • Contention, space, false sharing • Solution: provably scalable allocator • Hoard • Extended memory manager for servers • Reap Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  21. Hoard: Key Insights • Bound local memory consumption • Explicitly track utilization • Move free memory to a global heap • Provably bounds memory consumption • Manage memory in large chunks • Avoids false sharing • Reduces heap contention Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  22. Overview of Hoard global heap • Manage memory in heap blocks • Page-sized • Avoids false sharing • Allocate from local heap block • Avoids heap contention • Low utilization • Move heap block to global heap • Avoids space blowup processor 0 processor P-1 … Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  23. Hoard: Under the Hood get or return memory to global heap malloc from local heap, free to heap block select heap based on size Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  24. Summary of Analytical Results • Space consumption: near optimal worst-case • Hoard: O(n log M/m + P) {P ¿ n} • Optimal: O(n log M/m)[Robson 70]: ≈ bin-packing • Private heaps with ownership: O(P n log M/m) • Provably low synchronization • n = memory required • M = biggest object size • m = smallest object size • P = processors Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  25. Empirical Results • Measure runtime on 14-processor Sun • Allocators • Solaris (system allocator) • Ptmalloc (GNU libc) • mtmalloc (Sun’s “MT-hot” allocator) • Micro-benchmarks • Threadtest: no sharing • Larson: sharing (server-style) • Cache-scratch: mostly reads & writes (tests for false sharing) • Real application experience similar Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  26. Runtime Performance: threadtest • Many threads,no sharing • Hoard achieves linear speedup speedup(x,P) = runtime(Solaris allocator, one processor) / runtime(x on P processors) Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  27. Runtime Performance: Larson • Many threads,sharing(server-style) • Hoard achieves linear speedup Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  28. Runtime Performance:false sharing • Many threads,mostly reads & writes of heap data • Hoard achieves linear speedup Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  29. Hoard in the “Real World” • Open source code • www.hoard.org • 13,000 downloads • Solaris, Linux, Windows, IRIX, … • Widely used in industry • AOL, British Telecom, Novell, Philips • Reports: 2x-10x, “impressive” improvement in performance • Search server, telecom billing systems, scene rendering,real-time messaging middleware, text-to-speech engine, telephony, JVM • Scalable general-purpose memory manager Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  30. Custom Memory Allocation • Programmers replace malloc/free • Attempt to increase performance • Provide extra functionality (e.g., for servers) • Reduce space (rarely) • Empirical study of custom allocators: • Lea allocator often as fast or faster • Custom allocation ineffective, except for regions. Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  31. Overview of Regions • Regions: separate areas, deletion only en masse regioncreate(r) r regionmalloc(r, sz) regiondelete(r) • Used in parsers, server applications Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  32. Overview • Building memory managers • Heap Layers framework • Problems with memory managers • Contention, space, false sharing • Solution: provably scalable allocator • Hoard • Extended memory manager for servers • Reap Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  33. Server Support • Certain servers need additional support • “Process” isolation • Multiple threads, many transactions per thread • Minimize accidental overwrites of unrelated data • Avoid resource leaks • Tear down all memory associated with terminated connections or transactions • Current approach (e.g., Apache): regions Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  34. Fast Pointer-bumping allocation Deletion of chunks Convenient One call frees all memory Space Can’t free objects Drag Can’t use for allallocation patterns Regions: Pros and Cons • Regions: separate areas, deletion only en masse regioncreate(r) r regionmalloc(r, sz) regiondelete(r) Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  35. Regions Are Limited • Can’t reclaim memory in regions  unbounded memory consumption • Long-running computations • Producer-consumer patterns • Current situation for Apache: • vulnerable to denial-of-service • limits runtime of connections • limits module programming • Regions: wrong abstraction Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  36. Reap Hybrid Allocator • Reap = region + heap • Adds individual object deletion & heap reapcreate(r) r reapmalloc(r, sz) reapfree(r,p) reapdelete(r) • Can reduce memory consumption • Fast • Adapts to use (region or heap style) • Cheap deletion Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  37. Using Reap as Regions Reap performance nearly matches regions Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  38. Reap In Progress • Incorporate Reap in Apache • Rewrite modules to use Reap • Measure space savings • Simplifies module programming& adds robustness against denial-of-service Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  39. Overview • Building memory managers • Heap Layers framework • Problems with memory managers • Contention, space, false sharing • Solution: provably scalable allocator • Hoard • Extended memory manager for servers • Reap Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  40. Open Questions • Grand Unified Memory Manager? • Hoard + Reap • Integration with garbage collection • Effective Custom Allocators? • Exploit sizes, lifetimes, locality and sharing • Challenges of newer architectures • SMT/CMP Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  41. Contributions Memory management for high-performance applications • Framework for buildinghigh-quality memory managers (Heap Layers)[Berger, Zorn & McKinley, PLDI-01] • Provably scalable memory manager (Hoard)[Berger, McKinley, Blumofe & Wilson, ASPLOS-IX] • Study of custom memory allocation;Hybrid high-performance memory managerfor server applications (Reap)[Berger, Zorn & McKinley, OOPSLA-2002] Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  42. Backup Slides Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  43. Empirical Results, Runtime Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  44. Empirical Results, Space Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  45. Robust Resource Management • User processes can bring down systems (= DoS) • Current solutions • Kill processes (Linux) • Die (Linux, Solaris, Windows) • Proposed solutions limit utilization • Quotas, proportional shares • Insight: As resources becomes scarce, make them “cost” more (apply economic model) Fork bomb – use all process ids • Malloc all memory – exhausts swap Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  46. Future Work “Performance, scalability, and robustness” • Short-term • Memory management • False sharing • Robust garbage collection for multiprogrammed systems(with McKinley, Blackburn & Stylos) • Locality – self-reorganizing data structures • Compiler-based static error detection[Guyer, Berger & Lin, in preparation] • Longer term • Safety & security as dataflow problems • Integration of OS/runtime/compiler Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  47. Rockall: Larson Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  48. Rockall: Threadtest Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  49. Hoard Conclusions • As fast as uniprocessor allocators • Performance linear in number of processors • Avoids false sharing • Worst-case provably near optimal • Scalable general-purpose memory manager Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

  50. Conceptually Modular malloc free Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger

More Related