1.34k likes | 1.6k Views
Memory Management Strategies Master Class. Game Connection 2012. About myself. Studied computer science at VUT, Austria Working in the games industry since 2004 PC, XBox360, PS2, PS3, Wii, DS Specialization in low-level programming (threading, debugging, optimization) Teaching
E N D
Memory Management Strategies Master Class Game Connection 2012
About myself • Studied computer science at VUT, Austria • Working in the games industry since 2004 • PC, XBox360, PS2, PS3, Wii, DS • Specialization in low-level programming (threading, debugging, optimization) • Teaching • Founder & CTO @ Molecular Matters • Middleware for the games industry
Master class • Participation • Exchange of experiences • Discussion • There is no perfect way of doing things • There are many „rights“ & „wrongs“ • Let us talk about past experiences, mistakes, improvements • Share ideas! • Ask questions!
Agenda • C++ new/delete/placement syntax • Virtual memory • Allocators • Allocation strategies • Debugging facilities • Fill patterns • Bounds checking • Memory tracking
Agenda (cont'd) • Custom memory system • Relocatable allocations • Run-time defragmentation • Debugging memory-related bugs • Stack overflow • Memory overwrites
What's wrong with that? • void* operator new(size_t size, unsigned int align){ // align memory by some means return _aligned_malloc(size, align);}NonPOD* nonPod = new (32) NonPOD;NonPOD* nonPodAr = new (32) NonPOD[10]; • Addresses of nonPod and array?
C++ new • How do we allocate memory? • Using the new operator (keyword new) • T* instance = new T; • What happens behind the scenes? • Calls operator new to allocate storage for a T • Calls the constructor for non-POD types
C++ delete • How do we free memory? • Using the delete operator (keyword delete) • delete instance; • What happens behind the scenes? • Calls the destructor for non-POD types • Calls operator delete to free storage
C++ new, placement syntax • Keyword new supports placement syntax • Canonical form called placement new • Calls operator new(size_t, void*) • Returns the given pointer, does not allocate memory • Constructs an instance in-place • T* instance = new (memory) T; • Destructor needs to be called manually • instance->~T();
C++ new, placement syntax (cont'd) • Placement syntax supports N parameters • The compiler maps keyword new to the corresponding overload of operator new • T* instance = new (10, 20, 30) T;calls void* operator new(size_t, int, int, int); • First argument must always be of type size_t • sizeof(T) is inserted by the compiler
C++ new, placement syntax (cont'd) • Very powerful! • Custom overloads for operator new • Each overload must offer a corresponding operator delete • Can store arbitrary arguments for each call to new • An operator is just a function • Can be called directly if desired • Needs manual constructor call using placement new • Can use templates
C++ delete, placement syntax • Keyword delete does not support placement syntax • delete (instance, 10, 20); • Treated as a statement using the comma operator • Overloads of operator delete used when an exception is thrown upon a call to new • Overloads can also be called directly • Needs manual destructor call
C++ new[] • Creates an array of instances • Similar to keyword new, calls operator new[] • Calls the constructor for each non-POD instance • Supports placement syntax • Custom overloads of operator new[] possible • First sizeof() argument is compiler-specific • POD vs. non-POD!
C++ new[] (cont'd) • For non-PODs, constructors are called • delete[] needs to call destructors • How many destructors to call? • Compiler needs to store the number of instances • Most compilers add an extra 4 bytes to the allocation size • sizeof(T)*N + 4 (non-POD)sizeof(T)*N (POD)
C++ new[] (cont'd) • Important! • Address returned by operator new[] != address to first instance in the array • Source of confusion • Compiler-specific behaviour, makes it almost impossible to call overloads of operator delete[] directly • Do we need to go back 4 bytes or not? • Makes support for custom alignment harder
C++ delete[] • Deletes an array of instances • Similar to keyword delete, calls operator delete[] • Calls the destructors for each non-POD instance in reverse order • Again, POD vs. non-POD • Number of instances to destruct is stored by the compiler for non-POD types
C++ new vs. delete mismatch • Allocating with new, deleting with delete[] • operator delete[] expects the number of instances • May crash • Allocating with new[], deleting with delete • More subtle bugs, only one destructor will be called • Visual Studio heap implementation is smart enough to detect both mismatches
Summary • new != operator new • delete != operator delete • new[]/delete[] are compiler-specific • Never mix new/delete[] and new[]/delete • new offers powerful placement syntax
Virtual memory • Each process = virtual address space • Not to be confused with paging to hard disk • Virtual memory != physical memory • Address translation done by MMU • OS allocates/reserves memory in pages • Page sizes: 4KB, 64KB, 1MB, ...
Virtual memory (cont'd) • Virtual addresses are mapped to physical memory addresses • Contiguous virtual addresses != contiguous physical memory • A single page is the smallest amount of memory that can be allocated • Access restrictions on a per-page level • Read, write, execute, ...
Virtual memory (cont'd) • Simplest address translation: • Virtual address = page directory + offset • Page directory = physical memory page + additional info • Page directory entries set by OS • In practice: Multi-level address translation • See „What every programmer should know about memory“ by Ulrich Drepper • http://lwn.net/Articles/253361/
Virtual memory (cont'd) • Address translation is expensive • Several accesses to memory • „Page walk“ • Result of address translation is cached • Translation Look-aside Buffer (TLB) • Multiple levels, like D$ or I$ • TLB = Global resource per processor
Virtual memory (cont'd) • Allows to allocate contiguous memory even if the physical memory is not contiguous • Available on many architectures (PC, Mac, Linux, almost all consoles) • Used by CPU only • GPU, sound hardware, etc. needs contiguous physical memory • E.g. XPhysicalAlloc
Virtual memory (cont'd) • Growing allocators can account for worst-case scenarios more easily when using VM • Different address ranges for different purposes • Heap, stack, code, write-combined, ... • Helps with debugging!
Summary • Virtual memory nice to have, but not a necessity • Can help tremendously with debugging • Virtual memory made availabe to CPU, not GPU or other hardware • Virtual memory address range >> RAM
Why different allocators? • No silver bullet, many allocation qualities • Size • Fragmentation • Wasted space • Performance • Thread-safety • Cache-locality • Fixed size vs. growing
Common allocators • Linear • Stack, double-ended stack • Pool • Micro • One-frame, two-frame temporary • Double-buffered I/O • General-purpose
Linear allocator • + Supports any size and alignment • + Extremely fast, simply bumps a pointer • + No fragmentation • + No wasted space • + Lock-free implementation possible • + Allocations live next to each other • - Must free all allocations at once
Stack allocator • + Supports any size and alignment • + Extremely fast, simply bumps a pointer • + No fragmentation • + No wasted space • + Lock-free implementation possible • + Allocations live next to each other • +/- Must free allocations in reverse-order
Double-ended stack allocator • Similar to stack allocator • Can allocate from bottom or top • Bottom for resident allocations • Top for temporary allocations • Mostly used for level loading
Pool allocator • - Supports one allocation size only • + Very fast, simple pointer exchange • + Fragments, but can always allocate • + No wasted space • + Lock-free implementation possible • - Holes between allocations • + Memory can be allocated/freed in any order
Pool allocator (cont'd) • In-place free list • No extra memory for book-keeping • Re-use memory of freed allocations • Point to next free entry
Micro allocator • Similar to pool allocator, but different pools for different sizes • + Very fast, lookup & simple pointer exchange • + Fragments, but can always allocate • - Some wasted space depending on size • + Can use pool-local critical sections / lock-free • - Holes between allocations • + Memory can be allocated/freed in any order
One-frame temporary allocator • Similar to linear allocator • Used for scratchpad allocations during a frame • Another alternative is to use stack memory • Fixed-size • alloca()
Two-frame temporary allocator • Similar to one-frame temporary allocator • Ping-pong between two one-frame allocators • Results from frame N persist until frame N+1 • Useful for operations with 1 frame latency • Raycasts
Double-buffered I/O allocator • Two ping-pong buffers • Read into buffer A, consume from buffer B • Initiate reads while consuming • Useful for async. sequential reads from disk • Interface offers Consume() only • Async. reads done transparently & interleaved
General-purpose • Must cope with small & large allocations • Used for 3rd party libraries • Properties • - Slow • - Fragmentation • - Wasted memory, allocation overhead • - Must use heavy-weight synchronization
General-purpose (cont'd) • Common implementations • „High Performance Heap Allocator“ in GPG7 • Doug Lea's „dlmalloc“ • Emery Berger's „Hoard“
Growing allocators • With virtual memory • Reserve worst-case up front • Backup with physical memory when growing • Less hassle during development • Can grow without relocating allocations • Without virtual memory • Resize allocator for e.g. each level • Needs adjustment during development
Allocators • Separate how from where • How = allocator • Where = heap or stack • Offers more possibilities • Allows to use stack with different allocators
Summary • No allocator fits all purposes • Each allocator has different pros/cons • Ideally, for each allocation think about • Size • Frequency • Lifetime • Threading
Why do we need a strategy? • Using a general-purpose allocator everywhere leads to • Fragmented memory • Wasted memory • Somewhat unclear memory ownership • Excessive clean-up before shipping • We can do better!
Decision criteria • Lifetime • Application lifetime • Level lifetime • Temporary
Decision criteria (cont'd) • Purpose • Temporary while loading a level • Temporary during a frame • Purely visual (e.g. bullet holes) • LRU scheme • Streaming I/O • Gameplay critical
Decision criteria (cont'd) • Frequency • Once • Each level load • Each frame • N times per frame • Should be avoided in the first place
Where would you put those? • Application-wide singleton allocations • Render queue/command buffer allocations • Level assets • Particles • Bullets, collision points, … • 3rd party allocations • Strings