290 likes | 392 Views
Scalability. Advanced D3D Programming Richard Huddy RichardH@nvidia.com. Basic Objectives. To produce the best experience on every users machine To exploit all of the resources available To cope with a broad spread of hardware
E N D
Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com Scalability - R Huddy
Basic Objectives • To produce the best experience on every users machine • To exploit all of the resources available • To cope with a broad spread of hardware • To avoid ‘bottoming out’ during the shelf-life of the game / engine Scalability - R Huddy
What is a high-end PC? A 125+ mega-texel device A 125+ mega-pixel device A fast CPU ( >= 350MHz) AGP 2X/4X Bus Lots of system RAM ( >= 64MB) Huge frame buffers (16 to 32 MB) Multi-Texture at low cost Scalability - R Huddy
Power Trends CPU Speed Fill Rate ? Appreciate the absolute values and the ratios. Scalability - R Huddy
So what’s the problem? BeginScene() Second generation hardware: time A B C EndScene() CPU b c a Graphics time Third generation hardware: Wow, 10% faster! A B C EndScene() CPU b c a Graphics Scalability - R Huddy
What can you do to help? Scalability is the key: • Run at higher screen resolutions • Run at higher color depths • Use more complex rendering techniques on good hardware • Ship multiple geometry models • Protect your CPU • Unlock the frame rate Scalability - R Huddy
Higher Screen Resolutions 1) Include direct support for higher resolution modes (uses lots of disk space). 2) Store high resolution art and filter down to produce lower resolution art. 3) Store low resolution art and pixel double: If you have art at 512x384 use it for 1024x768 If you have art at 640x480 use it on 1280x1024 (but only use a 1280x960 viewport) Scalability - R Huddy
Higher Color Depths • Runs at much the same speed but gives the user a much richer experience • Uses frame buffer memory constructively • You can re-use the previous 16 bit assets • The main performance loss in true color is often due to texture management But beware the Frame Buffer + Z Buffer depth constraint on Riva TNT Scalability - R Huddy
Complex Rendering Techniques - I • Environment Mapping • Beware of spending too much CPU on this. • Dual Texture Lighting • Bump Mapping • Use more alpha transparency • But see also “Alpha sort issues” later on… Please try to use the extra fill rate! Scalability - R Huddy
Complex Rendering Techniques - II • Trilinear mipmapping for almost everything • Use Detail textures • Large textures for extra realism • 32 bit textures - where it’s a quality win • Compressed textures as long as quality is not compromised Scalability - R Huddy
Protect your CPU The big ones: • __ftol and other ‘type conversion’ nightmares • sqrt() • that’ll be seventy cycles please... • Reciprocal square root • One hundred and nine cycles through the FPU… • Transform and lighting (more on that later) Scalability - R Huddy
Removing __ftol • Remember that the compiler doesn’t have a choice but you can check the output • Write you own inline assembler conversion routine if… • You can accept differing rounding rules This doesn’t break the optimiser! Scalability - R Huddy
Replacement for sqrt() • Sqrt seems ‘natural’ if you are normalising vectors, calculating environment map coordinates or calculating distances - but it’s sloooow • Sample code is available from the developer web site or from me directly and will be in future versions of the SDK. Scalability - R Huddy
Saturation Arithmetic (C) Limiting a floating point number to lie in the range 0.0 to 1.0 inclusive (traditional method): if (f < 0.0) f = 0.0; else if (f > 1.0) f = 1.0; Scalability - R Huddy
Saturation Arithmetic (Pentium) if (*(long *)&f < 0) *(long *)&f = 0; else if (*(long *)&f > 0x3f800000) *(long *)&f = 0x3f800000; • This is faster on a Pentium class processor since the FPU is “non-optimal” (i.e. slow) and the integer unit is much faster. Scalability - R Huddy
Saturation Arithmetic (Pentium II) • Use the “cmov” instructions: cmp [f],0 cmovb [f],0 cmp [f],3f800000 cmova [f],3f800000 Faster since unpredictable branches are the bottleneck here. Unavailable on a Pentium. Scalability - R Huddy
Unlock the Frame Rate • It’s essential that your physics model can run at high refresh rates. • At least 100fps • 30 or 60 fps limits are not acceptable and lead to flat performance on high end hardware Scalability - R Huddy
The Value of Batching Case Specifics: • The average # of ‘Polys Per Call’ (PPC) to DrawPrimitive was 2.6, producing 40fps • Removing state changes to raise the average PPC to ~50 produced 58fps • Most of the removed state changes were “reasonable”, i.e. not logically redundant • The changes did not reduce visual quality at all • PPC of 200 is optimal Scalability - R Huddy
Alpha Sort Issues The “standard” solution is… 1) Draw all non-alpha polys (sort by texture) 2) Draw all alpha polys in back to front order with Z compare enabled and Z update disabled. This copes with overlapping alpha polys but you can’t sort by texture. (Intersection requires decimation). Scalability - R Huddy
Alpha Sort with Bounding Boxes When you are ready to draw your alpha polys then draw non-overlapping sets using the sort-by-texture technique as before A Here, you can safely draw all of A before any of B or C… B&C need sorting B Viewport C Scalability - R Huddy
Geometry - Part 1 • Use the DX6 Transform and Clip engine - it’ll be nearly as fast as your best efforts • It takes advantage of CPU specific optimisations done by Intel, AMD etc. • It uses the guard band clipping region to enhance performance • Use the DX7 interface ASAP Scalability - R Huddy
Geometry - Part 2 • This gets you ready for hardware which can do the job much faster than the CPU • Tell the chip designers if you need anything non-standard • If you think DX is too slow then use a run-time benchmark to select between DX and your own code Scalability - R Huddy
Geometry - Part 3 • Use the DX pipeline for geometry which may be rendered • Use your own transform for bounding boxes, collisions, portals etc • Treat hardware T&L as • Write only • Not necessarily pixel identical to CPU T&L DIPVB() Scalability - R Huddy
Geometry - Part 4 • Consider choosing between models at game start-up time • More complex Geometry should be several times more complex • Introduce some LOD management • Your artists are probably generating more complex models and then throwing them away Scalability - R Huddy
Lighting - Part 1 • If the DX Lighting model is good enough then there are people who want to help you • Multi-texture shadow maps and light maps can be very fast now • remember that (multi-pass != multi-texture) • Tell the chip companies what you need Scalability - R Huddy
Lighting - Part 2 • Support more lights • User a richer set of light types • Scale with available power • If you have more complex geometry you get better lighting quality Scalability - R Huddy
Summary • Use the D3D pipeline as much as possible • ‘Use’ the CPU carefully- ‘Abuse’ the fill rate • Get on board with DX7 • Offer the richest experience possible • You may have to treat the PC as two distinct platforms, ‘High-end’ and ‘Low-end’ Scalability - R Huddy
Questions ? ? ? ? ? ? ? Richard Huddy RichardH@nvidia.com www.nvidia.com Scalability - R Huddy