300 likes | 423 Views
Vertex Shaders for Geometry Compression. by Kenneth Hurley GDC San Francisco March 5 th , 2007. You might be an engineer if…. The sales people at the local computer store can't answer any of your questions. You can type 70 words per minute but can't read your own handwriting.
E N D
Vertex Shaders for Geometry Compression • by Kenneth Hurley • GDC San Francisco • March 5th, 2007
You might be an engineer if… The sales people at the local computer store can't answer any of your questions. • You can type 70 words per minute but can't read your own handwriting. • You can’t enjoy movies, because you are constantly analyzing the special effects. • Your wife hasn't the foggiest idea of what you do at work. • Your laptop computer costs more than your car.
Agenda • Introduction • Simple Compression • Quantization • Instancing with constants • Uncompressing • Questions
Introduction Why do we need it? Reduce AGP/PCI Bus Transfers Optimal sizes are <8, 16, 32 for Vertex data size AGP 1x = 266 MegaBytes a second AGP 2x = 533 MB/s AGP 4x = ~ 1GB/s AGP 8x, which provides about 2.1GB/s PCIe provides 2.5Gbps and PCIe 2.0 raised that to 5Gbps Theoretical maximum is around ~70 million triangles at 32 bytes per tri for AGP 8x and PCIe
Introduction Why do we need it? Reduce AGP/PCI Bus Transfers (cont) 70 Million is that really true? Probably not Could be higher, could be lower Drivers can store triangles on video card memory But textures are there too Uploads of textures (Managed textures), go across same bus. Even on consoles this is a win because of memory limitation and memory access
Introduction Why do we need it? Speeds Rendering Reduces video memory access Vertex pipes are filled faster even if loading from video memory (less access) Reduces memory consumption Of course, it does Win, Win, Win
Introduction Why do we need it? PS3/RSX/NV 7xxx architectures fetch 1 vertex attribute (i.e. 1 float4 per clock). Yet there 8 vertex engines executing at an effective 8 instructions per clock. Fetching 1 position, normal, tex coords, binormal and tangent is 5 Attributes (5 clocks) the vertex shader can be 64 instructions long and still not be limited by instruction count!
Simple “Obvious” Compression If you don’t need it, don’t include it Don’t include unused component, Z, W Or pack something else in there Remove Component(s) Normal, BiNormal, Tangent Cross Product to reconstruct in Vertex Shader Remove UV (ST) and calculate in Vertex Shader Packed ARGB not floats (D3DCOLOR)
Quantization Quantization is constraining something to a discrete set of values In our case reducing #bits/#bytes to represent floats or integers Quantization is Lossy Trades #bits/Bytes for precision Find acceptable error for your application Distant LOD objects can have higher errors and will be less noticeable.
Quantization Compression #define NUMBITS 16 // number of bits to retain #define fracScale (1 << NUMBITS) int Quantize(float value, float fracScale) { Return Float2Int(clamp(value * fracScale, -fracScale, fracScale)); } unsigned int Quantize(float value, float fracScale) { Return Float2Int(clamp(value * fracScale, 0, fracScale)); }
Quantization Decompression #define NUMBITS 16 // number of bits to retain #define fracScale (1 << NUMBITS) float Decompress(int value) { return ((float)value * fracScale); }
Scaled Offset Separable components (scaled offset) Minimum and maximum for static objects used to pick offset point and scale For dynamic objects (Animated, skinned, etc) this minimum and maximum must include all dynamic changes
Scaled Offset Redistributes quantization based on choosing a scale that covers entire object void CalculateScaleandOffset(Vertex &vertices, float &offset, float &scale) { offset = 0.0f; UpperRange = maxfloat; for every vertex { LowerRange = min(offset, Vertex); UpperRange = max(UpperRange, Vertex); } scale = (UpperRange – offset); }
Scaled Offset void ScaleandOffsetVerts(Vertex &vertices, Vertex &newVerts, float &offset, float &scale) { for every vertex { newVerts = Float2Short((vertices – offset) / scale); } }
Scaled Offset Decompression Vertex ScaleandOffsetVerts(Vertex v, float &offset, float &scale) { return (((float)v * scale) + offset; }
XVOX Demo Geomorphing terrain data in 36 bytes of vertex data. Trilinear displacement mapping UV (ST) texture coordinates can be same as dU, dV with a scale stored in constant memory Should probably use Ambient occlusion or Ambient aperture for lighting. Or light in world space For Time of Day use color ramp textures to light terrain.
XVOX Demo Displacement mapping V’(u,v) = V(u,v) + d(u,v) * N(u,v) Assuming normal is always up (terrain) V’(u,v) = ( u, v, d(u,v) ) struct VS_INPUT { float4 d1_d2; // 4 mipmap displacement values float4 u1v1_u2v2; //UV displacments + UV+1 float lod; // R-LOD selection };
XVOX ideas Vertex streams The coordinates (u, v) are taken from a first vertex stream and the displacements d from a second vertex stream. This is done so the (u, v) coordinates can be reused for each displacement mapped square, resulting in less memory used
Transform Compression Compress by finding a dominate axis of the data Given vertex data, setup a covariance matrix and extract the eigenvectors form See ShaderX for details, pp. 176-180 Decompression is simply the inverse of the covariance matrix Since we are multiplying the position by a matrix for HCLIP space, the matrices can be rolled together Achieves 50% compression without additional overhead
Idea from Displacement Maps Uses Quadtree/Octree structure Object Vertices are displaced from quad/oct node corner or center On rendering each entity/object set in the node of the tree, set vertex constant value to quad corner or center
More ideas for less bus traffic Put the vertex data in constant memory Particles/Billboards or low poly data can be stored there Then pass in index as a UBYTE component of D3DCOLOR Other 3 bytes can be used for normal, etc. 2 shorts for scale and offset Offset from Quad/Octree node UV coordinates are a good thing to store there Example, low poly trees Pass Offset, scale and rotation around up axis 2 16 bit shorts and 1 float 8 bytes per tree
More ideas for less bus traffic Similar compression can be used for writing pixels (other than display) For example deferred rendering storage Depth buffers
Summary Remove unused component Or use for something else Quantize what you can Separate components by regions Displacement maps Reduce Memory and AGP/video memory traffic!
References • [Calver] Dean Calver, “Vertex Decompression in a Shader”, ShaderX, Wordware publishing, pp 172-187 • [Vlietinck] Jan Vlietinck, “Hardware trilinear displacement mapping without tessellator and vertex texturing”, online http://users.belgacom.net/gc610902/technical.htm • [Forsyth] Tom Forsyth, “Practical Displacement Maps” GDC, 2003, Available from http://home.comcast.net/~tom_forsyth/papers/papers.html • [Wloka] Matthias Wloka, “Personal Communication”, March 2007
Shameless Plug Our game engine will soon be available for free on the PC Editor (XML Format), Terrain painting, object placement Particle Systems RakNet Networking system Ageia Physics Lua Scripting A.I. through Hierarchical State machines, events, scripting, triggers Sound through OpenAL Auto Navmesh Generation GUI Editor Model Viewer Direct X9, DirectX10, HDR, Shadow Maps, Radiosity lightmaps (WIP) More…
Questions? More information available on website. http://www.signaturedevices.com & http://www.graffitientertainment.com (Publishing Subsidiary) submissions@graffitientertainment.com