450 likes | 722 Views
Cross Platform Development Best Practices. Matt Lee, Kev Gee Microsoft Game Technology Group. Agenda. Code Considerations CPU Considerations GPU Considerations IO Considerations Content Considerations Data Build System Geometry Formats Texture Formats Shaders Audio Considerations.
E N D
Cross Platform Development Best Practices Matt Lee, Kev Gee Microsoft Game Technology Group
Agenda • Code Considerations • CPU Considerations • GPU Considerations • IO Considerations • Content Considerations • Data Build System • Geometry Formats • Texture Formats • Shaders • Audio Considerations
Compiler Comparison • VS 2005 front end used for both platforms • Preprocessor benefits both platforms • Debugger experience is the same • Full 2005 IDE support coming • Xbox 360 optimizing back end added with XDK install • Single solution / MSBuild file can target both platforms
PC CPUs • Intel Pentium D / AMD Athlon64 X2 • Programming Model • 2 Cores running @ around 3.20 GHz • 12-KB Execution trace cache • 16-KB L1 cache, 1 MB L2 cache • Deep Branch Prediction • Dynamic data flow analysis • Speculative Execution • Little-endian byte ordering • SIMD instructions • Quad Core announced for early 2007
360 Custom CPU • Custom IBM Processor • 3 64-bit PowerPC cores running at 3.2 GHz • Two hardware threads per core • 32-KB L1 instruction cache & data cache, per core • Shared 1-MB L2 cache • 128-byte cache lines on all caches • Big-endian byte ordering • VMX 128 SIMD • Lots of Registers
Performance Tools • Profiling approaches are very similar between PC and Xbox 360 • PIX for Xbox 360 & PIX for Windows • Being developed by the same team now • Use instrumented tools on Xbox 360 • XbPerfView / Tracedump • Xbox 360 does not have a sampling profiler yet • Use PC profiling tools • Intel VTune / AMD Code Analyst / VS Team System Profiler • Attend the Performance Hands on training!
Focus Your Efforts • Use performance tools to guide work • Areas where we have seen platform specific efforts reap rewards • Single Data Pass engine design • High Frequency Game API Layers • Use your profiler tools to target the hot spots • Math Library - Bespoke vs XGMath vs D3DXMath
Impact on Code Design • Designing Cross platform APIs • Use of virtual Functions • Parameter passing mechanisms • Pass by reference vs. pass by value • Typedef vector types and intrinsics • Math Library Design Case Study • Use of inlining
Use of Virtual Functions • Be careful when using virtual functions to hide platform differences • Virtual function performance on Xbox 360 • Adds branch instruction which is always mispredicted! • Compiler limited in optimizing these • Make a concrete implementation for Xbox 360 • Avoid virtual functions in inner loops
class IRenderSystem { …… public: #if !defined(_XBOX) virtual void Draw()=0; #else void Draw(); #endif }; void IRenderSystem::Draw() { // 360 Implementation …… } Cross Platform Render Example (ctd.) D3D9 & D3D10 implementations subclass for specialization
Beware Big Constructors • Ctors can dominate execution time • Ctors often hidden to casual observer • Copy ctors add objects to containers • Arrays of C++ objects are constructed • Overloaded operators may construct temporaries • Consider: should ctor init data? • Example: matrix class zeroing all data • Prefer array initialization = { … }
Inlining • Careful inlining is in general a Good Thing • Plan to spend time ensuring the compiler is inlining the right stuff • Use Perf Tools such as VTune / Trace recorder • Try the “inline any suitable” option • Enable link-time code generation • Consider profile-guided optimization • Use __forceinline only where necessary
Consider Passing Native Types by Value • Xbox 360 has large registers • 64 bit Native PC does too • Pass and return these types by value • int, __int64, float • Consider these types if targeting SSE / VMX • __m128 / __vector4, XMVECTOR, XMMATRIX • Pass structs by pointer or reference • Help the compiler using _restrict
Math Library Header (Xbox 360) #if defined( _XBOX ) #include <ppcintrinsics.h> #include <vectorintrinsics.h> typedef __vector4 XVECTOR; typedef const XVECTOR XVECTOR_PARAM; typedef XVECTOR& XVECTOR_OUTPARAM; #define XMATHAPI inline #define VMX128_INTRINSICS #endif Pass by value
Math Library Header (Windows) #if defined( _WIN32 ) #include <xmmintrin.h> typedef __m128 XVECTOR; typedef const XVECTOR& XVECTOR_PARAM; typedef XVECTOR& XVECTOR_OUTPARAM; #define XMATHAPI inline #define SSE_INTRINSICS #endif Pass by reference
Math Library Function XVECTOR XMATHAPI XVectorAdd( XVECTOR_PARAM vA, XVECTOR_PARAM vB ) { #if defined( VMX128_INTRINSICS ) return __vaddfp( vA, vB ); #elif defined( SSE_INTRINSICS ) return _mm_add_ps( vA, vB ); #endif }
Threading • Why Multithread? • Necessary to take full advantage of modern CPUs • Attend the Multi-threading talk later today • Covers synchronization prims and lockless sync methods • See Also: • Talks from Intel and AMD (GDC2005 / GDC-E) • OpenMP – C, not C++, useful in limited circumstances • Concur – C++, see • http://microsoft.sitestream.com/PDC05/TLN/TLN309_files/Default.htm#nopreload=1&autostart=1
D3D Architectural Differences • D3D9 draw call cost is higher on Windows than on Xbox 360 • 360 is optimized for a Single GPU target • D3D10 improves draw call cost by design on Windows • Very important to carefully manage the number of batches submitted • This can have an impact on content creation • This work will help with 360 performance too
Agenda • Code Considerations • CPU Considerations • GPU Considerations • IO Considerations • Content Considerations • Data Build System • Geometry Formats • Texture Formats • Shaders • Audio Considerations
PC GPUs • Wide variety of available Direct3D9 H/W • CAPs and Shader Models abstract over feature differences • GPUs that are approximately equivalent performance to the Xbox 360 GPU • ATi X1900 / NVidia 7800 GTX • Shader Model 3.0 support • Direct3D10 Standardizes feature set • H/W Scales on performance instead
Xbox 360 Custom GPU • Direct3D 9.0+ compatible • High-Level Shader Language (HLSL) 3.0+ support • 10 MB Embedded DRAM • Frame Buffer with 256 GB/sec bandwidth • Hardware scaling for display resolution matching • 48 shader ALUs shared between pixel and vertex shading (unified shaders) • Up to 8 simultaneous contexts (threads) in-flight at once • Changing shaders or render state can be cheap, since a new context can be started up easily • Hardware tesselator • N-patches, triangular patches, and rectangular patches • For non continuous / adaptive cases trade memory for this feature on PC systems
Explicit Resolve Control • Copies surface data from EDRAM to a texture in system memory • Required for render-to-texture and presentation to the screen • Can perform MSAA sample averaging or resolve individual samples • Can perform format conversions and biasing • Cannot do rescaling or resampling of any kind • This can Impact your Xbox 360 engine design as it adds an extra step to common operations.
Agenda • Code Considerations • CPU Considerations • GPU Considerations • IO Considerations • Content Considerations • Geometry • Textures • Shaders • Audio data
Use Native File I/O Routines • Only native routines support key features: • Asynchronous I/O • Completion routines • Prefer CreateFile and ReadFile • Guaranteed as fast or faster than any other alternatives • Avoid fopen, fread, C++ iostreams
Use Asynchronous File I/O • File read/write operations block by default • Async operations allows the game to do other interesting work • CreateFile with FILE_FLAG_OVERLAPPED • Use FILE_FLAG_NO_BUFFERING, too • Guarantees no intermediate buffering • Use OVERLAPPED struct to determine when operation is complete • See CreateFile docs for details
Memory Mapped File I/O • Fastest way to load data on Windows • However, the 32 bit address space is getting tight • This is a great 64 bit feature add! • Memory Mapped I/O not supported on 360 • No HDD backed Virtual Memory management system
Universal Gaming Controller • XInput is the same API for Xbox 360 and Windows • The Microsoft universal controller is a reference design which can be leveraged by other hardware manufacturers • XP Driver available from Windows Update • Support is built in to Xbox 360 and Windows Vista
Agenda • Code Considerations • CPU Considerations • GPU Considerations • IO Considerations • Content Considerations • Data Build System • Geometry Formats • Texture Formats • Shaders • Audio Considerations
Data Build System • Add a data build / processing phase to your production system • Compile, optimize and compress data according to multiple target platform requirements • Easier and faster to handle endian-ness and other format conversions offline • Data packing process can occur here too • Invest time in making the build fast • Artists need to rapidly iterate to make quality content • Incremental builds can really help reduce the buildtime • Try the XNA build tools • Copies of XNA build CTP are available NOW!
Geometry Compression • Offline Compression of Geometry • Provides wins across all platforms • Disk I/O wins as well as GPU wins • The compression approach is likely to be target specific • PC is usually a superset of the consoles in this area • D3D9 CAPs / limitations to consider • 16 bit Normals - D3DDECLTYPE_FLOAT16_2
Compressing Textures • Wide variety of Texture Compression Tools • ATI Compressinator • DirectX SDK DDS tools • NVIDIA – Photoshop DDS Export • Compression tools for 360 (xgraphics.lib) • Supports endian swap of texture formats • Build your own too! • Make them fit your content.
Texture Formats • DXT* / DXGI_FORMAT_BC* • BC == Block Compressed • Standard DXT* formats across all platforms • DXN / DXGI_FORMAT_BC5 / BC5u • 2-component format with 8 bits of precision per component • Great for normal maps • DXT3A / DXT5A • Single component textures made from a DXT3/DXT5 alpha block • 4 bits of precision • Xbox 360 / D3D9 Only
Texture Arrays • Texture arrays • generalized version of cube maps • D3D9 emulate using a texture atlas • Xbox 360 • Up to 64 surfaces within a texture, optional MIPmaps for each surface • Surface is indexed with a [0..1] z coordinate in a 3D texture fetch • D3D10 supports this as a standard feature • Up to 512 surfaces within a texture • Bindable as rendertarget, with per-primitive array index selection
Custom Vertex Fetch / Vertex Texture • D3D9 Vertex Texture implementations use intrinsics • tex2dlod() • 360 supports explicit instructions for this • D3D10 supports this as a standard feature • Load() from buffer (VB, IB, etc.) at any stage • Sample() from texture at any stage
Effects • D3DX FX and FX Lite co-exist easily • #define around the texture sampler differences • Preshaders are not supported on FX Lite • We advise that these should be optimized to native code for D3D9 Effects
HLSL Development • Set up your engine and tools for rapid shader development and iteration • Compile shaders offline for performance, • maybe allow run-time recompilation during development • Be careful with shader generation tools • Perf needs to be considered • Schedule / Plan work for this
Cross-Platform HLSL Consideration • Texture access instruction considerations • Xbox 360 has native tfetch / getWeights features • Constant texel offsets (-8.0 to 7.5 in 0.5 increments) • Independent of texture size • Direct3D 10 supports integer texture offsets when fetching • Direct3D 10 supports getdimensions() natively • Equivalent to getWeights • Direct3D 9 can emulate tfetch & getWeights behavior using a shader constant for texture dimensions
HLSL Example float2 g_invTexSize = float2( 1/512.0f, 1/512.0f); float2 getWeights2D( float2 texCoord ){ return frac( texCoord / g_invTexSize ); } float4 tex2DOffset( sampler t, float2 texCoord, float2 offset ){ texCoord += offset * g_invTexSize; return tex2D( t, texCoord ); }
Shader management • Find a balance between übershaders and specialized shader libraries • Dynamic/static branching versus static compilation • Small shader libraries can be built and stored inside a single Effect file • One technique per shader configuration • Larger shader libraries • Hash table populated with configurations • Streaming code can load could shader groups on demand • Profile-guided content generation • Avoid compiling shaders at run time • Compiled shaders compress very well
Audio Considerations • XACT (Microsoft Cross-Platform Audio Creation Tool) • API and authoring tool parity: • author once, deploy to both platforms • Primary difference = wave compression • ADPCM on Windows vs. Xbox 360 native XMA support • XMA: controllable quality setting (varies, typically ~6-14:1) • ADPCM: Static ~3.5:1 compression • Likely need to trade memory for bit rate. • On Windows, can use hard disk streaming to balance lower compression rates if needed
Call To Action! • Design your games, engines and production systems with cross platform development in mind • (PC / Xbox 360 / Other) • Invest in making your data build system fast • Take advantage of each platforms strengths • Target a D3D10 content design point and fallback to D3D9+, D3D9, … • Provide feedback on how we can make production easier • Attend the XACT, HLSL, SM4.0 and Performance Hands On Labs