1 / 26

Firaxis LORE

Firaxis LORE. And other uses of D3D11. Low Overhead Rendering Engine. Or, how I learned to Render 15,000+ batches at 60 FPS. Overview. Civ 5 is a big game, covers 6000 years of history The entire map can be populated/ polluted with all sorts of things the user creates

admon
Download Presentation

Firaxis LORE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Firaxis LORE And other uses of D3D11

  2. Low Overhead Rendering Engine • Or, how I learned to Render 15,000+ batches at 60 FPS

  3. Overview • Civ 5 is a big game, covers 6000 years of history • The entire map can be populated/ polluted with all sorts of things the user creates • Need to be able to render a huge amount of possibly disparate types

  4. Early Goals • Build brand new Engine for Civilization V • Like the game, we wanted graphics engine to be able to ‘stand the test of time’ • Decided while D3D11 was in Alpha to build the engine natively for D3D11 architecture, and map backwards to DX9

  5. Step 1: Cutting the overhead down • Shaders start in Firaxis Shading Language (FSL) superset of HLSL • Compiles into CPP and Header file – all shader constants are mapped to structs, grouped into packages where all packages have same bindings • Model Code is templated – FSL generated header is then bound with template code • Result is tiny amount of code that fills out required shading, barely shows up on profiling FSL Files CPP / H Template Code Compile Time Glue Code

  6. Step 2: Abstracting the Rendering • Still have to Support DX9, might have to support consoles in future • Might have to write a ‘driver’ • Our solution: Make DX9 ‘look like’ DX11 • Started with as a restricted design as possible, and expanded as we needed to

  7. Packetized Rendering • Stateless rendering, much simpler then D3D • Command based – all rendering is performed by self contained command • A command set may contain a list of surfaces to render, each with shader constant payload • A surface is an immutable bundle of an IB, VB, textures, shader def, etc • All state is bundled into a packages Alpha State, Z State, etc. Commands reference one of these state packages • Entire Frame is queued up • Minimal per frame allocation

  8. Only 5 Types of commands • COMMAND_RENDER_BATCHES • A List of surfaces to render into 1 or more rendertargets, with alpha and Zstate bundles • Surfaces have IB, VB, sampler and texture bundles. All required state is specified • COMMAND_GENERATE_MIPS • COMMAND_RESOLVE_RENDERTEXTURE • COMMAND_COPY_RENDERTEXTURE • COMMAND_COPY_RESOURCE

  9. Command Stream Packetized Rendering Rendering System Command Stream Command Stream Command Stream Command Stream Rendering Engine D3D/Driver

  10. Command Stream Step 3: Threading Command Stream Command Stream Job Manager Job Command Stream Job Job Job Job Job Job Job Job Job Job Job Job Rendering System

  11. Why do we queue up entire Frame? • Would seem like additional overhead, but perf analysis shows it is a net win • Internal command setup is super-cheap, just some mem copies • Engine cache coherency is vastly better • D3D driver cache coherency is much better with one giant dump • Very low % of total CPU time spent in submission • Allows us to filter redundant D3D calls. Call overhead adds up • Fast even in DX9

  12. Implementation advantages • Once ‘stateless’ concept grasped, code maintaince easy • Next to no state-leaking (flickering alpha, textures etc) • Because rendering is packetized, individual jobs need little or no communication between each other • NO THREADING BUGS

  13. Threaded D3D11 submission • Top issues: • Generally High driver overhead for batch submission • But: D3D11 has multithreaded submission • Command Streams not necessarily map 1:1 to CommandLists • Civilization V can change how it submits via settings the config files

  14. Step 4: Gloating over results • Wildly surpassed commonly held beliefs on # of batches possible, especially with threading *Believed to be GPU limited

  15. Conclusions • High throughput rendering is possible: IF: • care taken to reduce application overhead • Job based, pay-load based rendering • Redundant state and calls filtered • Use D3D11 command lists • Engine can peg 12 threads at 97% (sans driver)

  16. D3D11 Features: Tessellation • Major addition to D3D11 API [Screenshot]

  17. Terrain • Civ5 contains one of the most complex terrain systems ever made • Complete procedural process • Use GPU to raytrace and anti-alias shadows • Caching system to deal with cases where terrain is too big

  18. Tessellation • Terrain very high detail, roughly 64x64 heightmap data per hex • Triangle count, when zoomed out, can be in the millions • Used Tessellation as a ‘drop-in’

  19. Tessellation Cont • Simple Bicupic Beta Spline patches • Adjusted global tessellation as camera moved in and out • A strict performance increase : 10%-40% faster, on both AMD and Nvidia hardware. • More Adapative techinques would work even better, but didn’t have time to implement them

  20. Leaders

  21. Leader Rendering • Largely done with DX10.1 rendering tech • New Variable bit rate compression technology implemented for D3D11. • 2.5 GBs of texture data reduced to 150mbs, can be decompressed on the GPU • Details forthcoming, research is in publication submission process – extensive use of UAVs

  22. Future Stuff, NO AO

  23. Future Stuff (CS), AO

  24. Q&A

More Related