Game Connection 2012

Memory Management Strategies Master Class Game Connection 2012

About myself • Studied computer science at VUT, Austria • Working in the games industry since 2004 • PC, XBox360, PS2, PS3, Wii, DS • Specialization in low-level programming (threading, debugging, optimization) • Teaching • Founder & CTO @ Molecular Matters • Middleware for the games industry

Master class • Participation • Exchange of experiences • Discussion • There is no perfect way of doing things • There are many „rights“ & „wrongs“ • Let us talk about past experiences, mistakes, improvements • Share ideas! • Ask questions!

Agenda • C++ new/delete/placement syntax • Virtual memory • Allocators • Allocation strategies • Debugging facilities • Fill patterns • Bounds checking • Memory tracking

Agenda (cont'd) • Custom memory system • Relocatable allocations • Run-time defragmentation • Debugging memory-related bugs • Stack overflow • Memory overwrites

C++ new/delete/placement syntax

What's wrong with that? • void* operator new(size_t size, unsigned int align){ // align memory by some means return _aligned_malloc(size, align);}NonPOD* nonPod = new (32) NonPOD;NonPOD* nonPodAr = new (32) NonPOD[10]; • Addresses of nonPod and array?

C++ new • How do we allocate memory? • Using the new operator (keyword new) • T* instance = new T; • What happens behind the scenes? • Calls operator new to allocate storage for a T • Calls the constructor for non-POD types

C++ delete • How do we free memory? • Using the delete operator (keyword delete) • delete instance; • What happens behind the scenes? • Calls the destructor for non-POD types • Calls operator delete to free storage

C++ new, placement syntax • Keyword new supports placement syntax • Canonical form called placement new • Calls operator new(size_t, void*) • Returns the given pointer, does not allocate memory • Constructs an instance in-place • T* instance = new (memory) T; • Destructor needs to be called manually • instance->~T();

C++ new, placement syntax (cont'd) • Placement syntax supports N parameters • The compiler maps keyword new to the corresponding overload of operator new • T* instance = new (10, 20, 30) T;calls void* operator new(size_t, int, int, int); • First argument must always be of type size_t • sizeof(T) is inserted by the compiler

C++ new, placement syntax (cont'd) • Very powerful! • Custom overloads for operator new • Each overload must offer a corresponding operator delete • Can store arbitrary arguments for each call to new • An operator is just a function • Can be called directly if desired • Needs manual constructor call using placement new • Can use templates

C++ delete, placement syntax • Keyword delete does not support placement syntax • delete (instance, 10, 20); • Treated as a statement using the comma operator • Overloads of operator delete used when an exception is thrown upon a call to new • Overloads can also be called directly • Needs manual destructor call

C++ new[] • Creates an array of instances • Similar to keyword new, calls operator new[] • Calls the constructor for each non-POD instance • Supports placement syntax • Custom overloads of operator new[] possible • First sizeof() argument is compiler-specific • POD vs. non-POD!

C++ new[] (cont'd) • For non-PODs, constructors are called • delete[] needs to call destructors • How many destructors to call? • Compiler needs to store the number of instances • Most compilers add an extra 4 bytes to the allocation size • sizeof(T)*N + 4 (non-POD)sizeof(T)*N (POD)

C++ new[] (cont'd) • Important! • Address returned by operator new[] != address to first instance in the array • Source of confusion • Compiler-specific behaviour, makes it almost impossible to call overloads of operator delete[] directly • Do we need to go back 4 bytes or not? • Makes support for custom alignment harder

C++ delete[] • Deletes an array of instances • Similar to keyword delete, calls operator delete[] • Calls the destructors for each non-POD instance in reverse order • Again, POD vs. non-POD • Number of instances to destruct is stored by the compiler for non-POD types

C++ new vs. delete mismatch • Allocating with new, deleting with delete[] • operator delete[] expects the number of instances • May crash • Allocating with new[], deleting with delete • More subtle bugs, only one destructor will be called • Visual Studio heap implementation is smart enough to detect both mismatches

Summary • new != operator new • delete != operator delete • new[]/delete[] are compiler-specific • Never mix new/delete[] and new[]/delete • new offers powerful placement syntax

Virtual memory

Virtual memory • Each process = virtual address space • Not to be confused with paging to hard disk • Virtual memory != physical memory • Address translation done by MMU • OS allocates/reserves memory in pages • Page sizes: 4KB, 64KB, 1MB, ...

Virtual memory (cont'd) • Virtual addresses are mapped to physical memory addresses • Contiguous virtual addresses != contiguous physical memory • A single page is the smallest amount of memory that can be allocated • Access restrictions on a per-page level • Read, write, execute, ...

Virtual memory (cont'd) • Simplest address translation: • Virtual address = page directory + offset • Page directory = physical memory page + additional info • Page directory entries set by OS • In practice: Multi-level address translation • See „What every programmer should know about memory“ by Ulrich Drepper • http://lwn.net/Articles/253361/

Virtual memory (cont'd) • Address translation is expensive • Several accesses to memory • „Page walk“ • Result of address translation is cached • Translation Look-aside Buffer (TLB) • Multiple levels, like D$ or I$ • TLB = Global resource per processor

Virtual memory (cont'd) • Allows to allocate contiguous memory even if the physical memory is not contiguous • Available on many architectures (PC, Mac, Linux, almost all consoles) • Used by CPU only • GPU, sound hardware, etc. needs contiguous physical memory • E.g. XPhysicalAlloc

Virtual memory (cont'd) • Growing allocators can account for worst-case scenarios more easily when using VM • Different address ranges for different purposes • Heap, stack, code, write-combined, ... • Helps with debugging!

Summary • Virtual memory nice to have, but not a necessity • Can help tremendously with debugging • Virtual memory made availabe to CPU, not GPU or other hardware • Virtual memory address range >> RAM

Allocators

Why different allocators? • No silver bullet, many allocation qualities • Size • Fragmentation • Wasted space • Performance • Thread-safety • Cache-locality • Fixed size vs. growing

Common allocators • Linear • Stack, double-ended stack • Pool • Micro • One-frame, two-frame temporary • Double-buffered I/O • General-purpose

Linear allocator • + Supports any size and alignment • + Extremely fast, simply bumps a pointer • + No fragmentation • + No wasted space • + Lock-free implementation possible • + Allocations live next to each other • - Must free all allocations at once

Stack allocator • + Supports any size and alignment • + Extremely fast, simply bumps a pointer • + No fragmentation • + No wasted space • + Lock-free implementation possible • + Allocations live next to each other • +/- Must free allocations in reverse-order

Double-ended stack allocator • Similar to stack allocator • Can allocate from bottom or top • Bottom for resident allocations • Top for temporary allocations • Mostly used for level loading

Pool allocator • - Supports one allocation size only • + Very fast, simple pointer exchange • + Fragments, but can always allocate • + No wasted space • + Lock-free implementation possible • - Holes between allocations • + Memory can be allocated/freed in any order

Pool allocator (cont'd) • In-place free list • No extra memory for book-keeping • Re-use memory of freed allocations • Point to next free entry

Micro allocator • Similar to pool allocator, but different pools for different sizes • + Very fast, lookup & simple pointer exchange • + Fragments, but can always allocate • - Some wasted space depending on size • + Can use pool-local critical sections / lock-free • - Holes between allocations • + Memory can be allocated/freed in any order

One-frame temporary allocator • Similar to linear allocator • Used for scratchpad allocations during a frame • Another alternative is to use stack memory • Fixed-size • alloca()

Two-frame temporary allocator • Similar to one-frame temporary allocator • Ping-pong between two one-frame allocators • Results from frame N persist until frame N+1 • Useful for operations with 1 frame latency • Raycasts

Double-buffered I/O allocator • Two ping-pong buffers • Read into buffer A, consume from buffer B • Initiate reads while consuming • Useful for async. sequential reads from disk • Interface offers Consume() only • Async. reads done transparently & interleaved

General-purpose • Must cope with small & large allocations • Used for 3rd party libraries • Properties • - Slow • - Fragmentation • - Wasted memory, allocation overhead • - Must use heavy-weight synchronization

General-purpose (cont'd) • Common implementations • „High Performance Heap Allocator“ in GPG7 • Doug Lea's „dlmalloc“ • Emery Berger's „Hoard“

Growing allocators • With virtual memory • Reserve worst-case up front • Backup with physical memory when growing • Less hassle during development • Can grow without relocating allocations • Without virtual memory • Resize allocator for e.g. each level • Needs adjustment during development

Allocators • Separate how from where • How = allocator • Where = heap or stack • Offers more possibilities • Allows to use stack with different allocators

Summary • No allocator fits all purposes • Each allocator has different pros/cons • Ideally, for each allocation think about • Size • Frequency • Lifetime • Threading

Allocation strategies

Why do we need a strategy? • Using a general-purpose allocator everywhere leads to • Fragmented memory • Wasted memory • Somewhat unclear memory ownership • Excessive clean-up before shipping • We can do better!

Decision criteria • Lifetime • Application lifetime • Level lifetime • Temporary

Decision criteria (cont'd) • Purpose • Temporary while loading a level • Temporary during a frame • Purely visual (e.g. bullet holes) • LRU scheme • Streaming I/O • Gameplay critical

Decision criteria (cont'd) • Frequency • Once • Each level load • Each frame • N times per frame • Should be avoided in the first place

Where would you put those? • Application-wide singleton allocations • Render queue/command buffer allocations • Level assets • Particles • Bullets, collision points, … • 3rd party allocations • Strings

Game Connection 2012