1 / 33

C++ on Next-Gen Consoles: Effective Code for New Architectures

C++ on Next-Gen Consoles: Effective Code for New Architectures. Pete Isensee Development Manager Microsoft Game Technology Group. Last Year at GDC. Chris Hecker ranted What did he say? Programmers: danger ahead Out-of-order execution: good In-order execution: bad

herve
Download Presentation

C++ on Next-Gen Consoles: Effective Code for New Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. C++ on Next-Gen Consoles:Effective Code for New Architectures Pete Isensee Development Manager Microsoft Game Technology Group

  2. Last Year at GDC • Chris Hecker ranted • What did he say? • Programmers: danger ahead • Out-of-order execution: good • In-order execution: bad • Microsoft and Sony are going to screw you • You are so hosed. Game over, man. • “There’s absolutely nothing you can do about this”

  3. Console Hardware Architectures • Optimized to do floating-point math • Optimized for multithreaded tasks • Optimized to run games • Not optimized to run general purpose code • Not optimized to do branch prediction, code reordering, instruction pipelining or other out-of-order magic • Large L2 caches • Large latencies

  4. We’re Game Programmers.We Love Challenges. • We will make games on these consoles • The solution is not assembly language • The solution is to tailor our C/C++ engines, inner loops and bottleneck functions to the realities of the hardware • Remember: C++ code can make or break your game’s performance

  5. Not Covering • Profiling(do it) • Multithreading(do it) • Memory allocation(avoid in game loop) • Compiler settings(experiment) • Exception handling(avoid it)

  6. Topics for Today • Thinking about L2 • Optimize memory access • Use CPU caches effectively • Thinking about in-order processing • Avoid function call overhead • Tips for efficient math • Avoid hidden C++ inefficiencies

  7. Optimize Memory Access • Proverb: thou shalt treat memory as if it were thy hard drive • You will be memory-bound on new consoles • Recommendations • Never read from the same place twice in a frame • Read data sequentially • Write data sequentially • Use everything you read

  8. Minimize Data Passes • Game frame loops often access data twice • Or three times • Or more • Optimize for a single pass • Consider less frequent operations • AI • Physics, collision • Networking • Particle systems Multiple Pass Architecture

  9. Pointer Aliasing Explained void init( float *a, const float *b ) { a[0] = 1.0f - *b; a[1] = 1.0f - *b; } Nominal case Worst case float a[2]={0.0f}; init( a, &a[0] ); 0.0 0.0 1.0 0.0 1.0 b a 0.0 1.0 0.0 0.0 a b

  10. A Solution: Restrict • Restrict keyword tells the compiler there’s no aliasing • Restrict permits the compiler to generate much more efficient code void init( float* __restrict a, const float* __restrict b ) { a[0] = 1.0f - *b; // compiler can do a[1] = 1.0f - *b; // the right thing }

  11. What to Restrict • Use restrict widely • Function pointer parameters • Local pointers • Pointers in structs/classes • But not: • Function return types • Casts • Global pointers (maybe) • References (maybe)

  12. Use the CPU Caches Effectively • The L2 cache is your best friend • Using the cache well is an art • Ensure you have a good profiler by your side

  13. Keep the Working Set Small • Pack commonly used data together • Frequently used data might deserve its own struct/class • Keep rarely used data separate • Example: texture file names • Consider bitfields • Bitfields are extremely efficient on PowerPC • Consider other forms of lossless compression

  14. Inefficient Structs Are Bad Mojo struct InefficientCar { bool manual; // padding here wheel wheels[8]; // 8 wheels? bool convertible; // more pad char engine; // 4 bits used char file[32]; // rarely used double maxAccel; // double? }; sizeof(InefficientCar) = 80

  15. Carefully Design Structures struct EfficientCar { wheel wheels[4]; // 4 wheels wheel *moreWheels; char *file; // stored elsewhere float maxAccel; // float unsigned engine:4; // bitfields unsigned manual:1; unsigned convertible:1; }; sizeof(EfficientCar) = 32

  16. Choose the Right Container • Prefer contiguous containers • Or at least mostly contiguous • Examples: array, vector, deque • Avoid node-based containers • List, set/map, binary trees, hash tables • If you must use a tree, consider a custom allocator for memory locality • Vector + std::sort is often faster (and smaller) than set or map or hash tables, by an order of magnitude

  17. Avoid Function Call Overhead • Function call overhead was a surprising cause of performance issues on Xbox • The same is true on Xbox 360 and PS3 • Fortunately, there are lots of solutions • Research compiler settings. On Xbox 360: • Inline “any suitable” • Enable link-time code generation • Spend time ensuring the compiler is inlining the right things

  18. Avoid Virtual Functions • Weigh the limitations of virtual functions • Adds a branch instruction • Branch is always mispredicted • Compiler is limited in how it can optimize • Consider replacing • virtual void Draw() = 0; • With • Xbox360.cpp: void Draw() { ... } • Windows.cpp: void Draw() { ... } • PS3.cpp: void Draw() { ... }

  19. Maximize Leaf Functions • Leaf functions don’t call other functions, ever • If a potential leaf function calls another function, the high-level function: • Is much less likely to be inlined • Must set up a stack frame • Must set up registers • Potential solutions • Remove the inner function completely • Inline the inner function • Provide two versions of the outer function

  20. Unroll Inner Loops • Compiler can’t unroll loops where n is variable • Even unrolling from ++i to i+=4 can be a significant gain • Eliminates three branch instructions • Increases opportunity for code scheduling • Don’t forget to hoist invariants out, too

  21. Example Unrolling // original for( i=a.beg(); i!=a.end(); ++i ) process(i); // unrolled e = a.end(); for( i=a.beg(); i!=e; i+=4 ) { process(i); process(i+1); process(i+2); process(i+3); }

  22. Pass Native Types by Value • Tradition says that “large” types are passed by pointer or reference, but be careful • New consoles have really large registers • Native types include • 64-bit int (__int64) • VMX vector (__vector4) – 128 bits! • Pass structs by pointer or reference • One exception: pass structs consisting of bitfields <= 64 bits by value

  23. Know Data Type Performance • int32 and int64 have equivalent perf • float and double have equivalent perf • int8 and int16 are slower than int • They generate extra instructions • High bits cleared or sign-extended • Example: int32 adds 2X faster than int16 adds • Recommendations • Store as smallest type required • Load into int32, int64 or double for calculations

  24. Use Native Vector Types • In CS 101, you learned to create abstract data types, such as matrices typedef std::vector<float,4> vec; typedef std::vector<vec,4> matrix; • This code is an abomination • At least on Xbox 360 and PS3 • Xbox 360 and PS3 have dedicated vector math units called VMX units • Use them!

  25. Your Math Buddies • __vector4 (4 32-bit floats; 128-bit register) • XMVECTOR (typedef for vector4) • XMMATRIX (array of 4 vector4s) • XMVECTOR operators (+,-,*,/) • Hundreds of XMVECTOR and XMMATRIX functions • Xbox 360-specific, but similar constructs in PS3 compilers

  26. Avoid Floating-Point Branches • FP branches are slow • Cache has to be flushed • ~10X slower than int branches • Avoid loops with float test expressions • Eliminate altogether if possible • Can be faster to calculate values you won’t use! • Compare integers instead • Replace with fsel when possible • 10-20X performance gain

  27. The fsel Option in Detail • Definition of hardware implementation: float fsel(float a, float b, float c) { return ( a < 0.0f ) ? b : c; } • You can replace expressions like • v = ( w < x ) ? y : z; // slow • With faster expressions like • v = fsel( w - x, y, z ); // turbo

  28. Prefer Platform-Specific Funcs • The C runtime (CRT) is not usually the best option when performance matters • Xbox 360 examples • Prefer CreateFile to fopen or C++ streams • Options for asynchronous reads and other goodness • Prefer XMemCpy to memcpy • 2-6X faster • Prefer XMemSet to memset • 8-14X faster

  29. Avoid Hidden C++ Inefficiencies • C++ rocks the house! • C++ can bring your game to its knees! • Consider these innocuous snippets • Quaternion q; • s.push_back( k ); • if( (float)i > f ) • obj->Draw(); • GameObject arr[1000]; • a = b + c; • i++;

  30. C++ is Dangerous • With power comes responsibility • Beware constructors • Is initialization the right thing to do? • Beware hidden allocations • Conversion casts may have significant cost • Use virtual functions with care • Beware overloaded operators • Stick to known idioms • Operator++ should be a constant-time operation. • Really.

  31. Summary • There absolutely are many things you can do to efficiently program next-gen consoles • Two key issues: L2/memory and in-order processing • Treat memory as you would a hard disk • Watch out for those branches; use tricks like fsel • Prefer a light C++ touch

  32. What’s Next • Our games are only as good as the weakest member of the team • Share what you’ve learned • “The sharing of ideas allows us to stand on one another’s shoulders instead of on one another’s feet” – Jim Warren

  33. Questions • pkisensee@msn.com • Fill out your feedback forms

More Related