1 / 36

Real-time Mesh Simplification Using the GPU

Real-time Mesh Simplification Using the GPU. Christopher DeCoro Natasha Tatarchuk 3D Application Research Group. Introduction. Implement Mesh Decimation in real-time Utilizes new Geometry Shader stage of GPU Achieves a 20x speedup over CPU. Project Motivation.

greach
Download Presentation

Real-time Mesh Simplification Using the GPU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

  2. Introduction • Implement Mesh Decimation in real-time • Utilizes new Geometry Shader stage of GPU • Achieves a 20x speedup over CPU

  3. Project Motivation • Massive Increases in submitted geometry • Geometry rendered per shadow map (6x for cubemap!) • Not always needed at highest resolution • Geometry not always known at build-time • Dynamically-skinned objects only finalized at run-time • May be customized to users machine based on capabilities, would need to be adapted at program load time • Could be dynamically generated per level, need to be adapted at level load time • Simplification therefore needs to be fast (or even real-time) Also, just as importantly… • We want applications that exercise & stress GS/GPU • Evaluate new capabilities of the GPU • Learn how to adapt previously CPU-bound algorithms • Develop GPU-centric methodologies • Identify future feature set for GS/GPU as a whole • Limitations still exist – which should be addressed?

  4. Contributions • Mapping of Decimation to GPU • 20x speedup vs. CPU • Enables load-time or real-time usage • Detail Preservation by Non-linear Warping • Also applicable to CPU out-of-core decimation • General-purpose GPU Octree • Adaptive decimation w/ constant memory • Applications not limited to simplification: collision detection, frustum culling, etc.

  5. Outline • Project Introduction and Motivation • Background • Decimation with Vertex Clustering • Geometry Shaders in Direct3D 10 • Geometry Shader-based Vertex Clustering • Adaptive Simplification w/ Non-linear Warps • Probabalistic Octrees on the GPU

  6. Vertex Clustering • Reduces mesh resolution • High-res mesh as input • Low-res as output • All implemented on the GPU • Ideal for processing streamed out data • Useful when rendering multiple times (i.e. shadows) • Can handle enormous models from scanned data • Based on “Out-of-Core Simplification of Large Polygonal Models,” P. Lindstrom, 2000 Figure from [Lindstrom 2000]

  7. Previous Rendering Pipeline • Vertex Shaders and Pixel Shaders • Limits 1 output per 1 input • No culling of triangles for decimation • Fixed destination for each stage • Result meshes cannot be (easily) saved and reused

  8. DirectX10 Rendering Pipeline • Geometry Shader in between VS & PS • Called for each primitive (usually triangle) • Able to access all vertices of a primitive • Can compute per-face quantities • Breaks 1:1 input-output limitation • Allows triangles to be culled from pipeline • Allows stream-out of processed geometry • Decimated meshes can easily be saved and reused

  9. Outline • Project Introduction and Motivation • Background • Geometry Shader-based Vertex Clustering • Overview • Quadric Generation • Optimal Position Computation • Final Clustering • Adaptive Simplification w/ Non-linear Warps • Probabilistic Octrees on the GPU

  10. Algorithm Overview • Start with the input mesh • Shown divided into clusters • Pass 1: Compute the quadric map from mesh • Use GS to compute quadric • Accumulate in cluster map, an RT used as large array • Pass 2: For each cluster, compute optimal position • Solves a linear system given by quadrics • Pass 3: Collapse each vertex to representative • 9x9x9 grid shown Model Courtesy of Stanford Graphics Lab

  11. Vertex Clustering Pipeline • Pass 1: Create Quadric Map • Input: Original Mesh • Computation: • Determine plane equation, face quadrics for triangle • Compute the cluster and address of each vertex • Pack quadric into RT at appropriate address • Output: Render Targets representing clusters with packed quadrics and average positions

  12. Quadric Map Implementation //Map a point to its location in the cluster map array float2 writeAddr( float3 vPos ) { uint iX = clusterId(vPos) / iClusterMapSize.x; uint iY = clusterId(vPos) % iClusterMapSize.y; return expand( float2(iX,iY)/float(iClusterMapSize.x) ) + 1.0/iClusterMapSize.x; } [maxvertexcount(3)] void main( triangle ClipVertex input[3], inoutPointStream<FragmentData> stream ) { //For the current triangle, compute the area and normal float3 vNormal = (cross( input[1].vWorldPos - input[0].vWorldPos, input[2].vWorldPos - input[0].vWorldPos )); float fArea = length(vNormal)/6; vNormal = normalize(vNormal); //Then compute the distance of plane to the origin along the normal float fDist = -dot(vNormal, input[0].vWorldPos); //Compute the components of the face quadrics using the plane coefficients float3x3 qA = fArea*outer(vNormal, vNormal); float3 qb = fArea*vNormal*fDist; float qc = fArea*fDist*fDist; //Loop over each vertex in input triangle primitive for(int i=0; i<3; i++) { //Assign the output position in the quadric map FragmentData output; output.vPos = float4(writeAddress(input[i].vPos),0,1); //Write the quadric to be accumulated in the quadric map packQuadric( qA, qb, qc, output ); stream.Append( output ); } } • Start with the input mesh • Shown divided into clusters • Compute the quadric map from mesh • Use GS to compute quadric • Accumulate in cluster map, an RT used as large array • For each cluster, compute optimal position • Collapse each vertex to representative • 9x9x9 grid shown

  13. Vertex Clustering Pipeline • Pass 2: Find Optimal Positions • Input: Cluster Map Render Targets, Full-screen Quad • Computation: • Determine if we can solve for optimal position • If not, fall back to vertex average • Output: Render Targets representing clusters with optimal position of representative vtx.

  14. Optimal Positions Original Mesh • For each cell, need representative • Naïve solution: Use averages • Looks very blocky • Does not consider the original faces, only vertices • Implemented solution: Use quadrics • Quadrics are a measure of surface • We can solve for optimal position Simplified w/ Averages Simplified w/ Quadrics

  15. Optimal Positions Implementation float3 optimalPosition(float2 vTexcoord) { float3 vPos = float3(0,0,0); float4 dataWorld, dataA0, dataB, dataA1; //Read the vertex average from the cluster map dataWorld = tClusterMap0.SampleLevel( sClusterMap0, vTexcoord, 0 ); int iCount = dataWorld.w; //Only compute optimal position if there are vertices in this cluster if( iCount != 0 ) { //Read all the data from the clustermap to reconstruct the quadric dataA0 = tClusterMap1.SampleLevel( sClusterMap1, vTexcoord, 0 ); dataA1 = tClusterMap2.SampleLevel( sClusterMap2, vTexcoord, 0 ); dataB = tClusterMap3.SampleLevel( sClusterMap3, vTexcoord, 0 ); //Then reassemble the quadric float3x3 qA = { dataA0.x, dataA0.y, dataA0.z, dataA0.y, dataA0.w, dataA1.x, dataA0.z, dataA1.x, dataA1.y }; float3 qB = dataB.xyz; float qC = dataA1.z; //Determine if inverting A is stable, if so, compute optimal position //If not, default to using the average position constfloat SINGULAR_THRESHOLD = 1e-11; if(determinant(quadricA) > SINGULAR_THRESHOLD ) vPos = -mul( inverse(quadricA), quadricB ); else vPos = dataWorld.xyz / dataWorld.w; } return vPos; } • Start with the input mesh • Shown divided into clusters • Compute the quadric map from mesh • Use GS to compute quadric • Accumulate in cluster map, an RT used as large array • For each cluster, compute optimal position • Collapse each vertex to representative • 9x9x9 grid shown

  16. Vertex Clustering Pipeline • Pass 3: Decimate Mesh • Input: Cluster Map Render Targets, Input Mesh • Computation: • Find clusters, Remap vertices to representative • Determine if triangle becomes degenerate • If not, stream output new triangle at new positions • Output: Low-resolution Mesh

  17. Final Clustering Implementation [maxvertexcount(3)] void main( triangle ClipVertex input[3], inoutTriangleStream<StreamoutVertex> stream ) { //Only emit a triangle if all three vertices are in diff. clusters if( all_different(clusterId(input[0].vPos), clusterId(input[1].vPos), clusterId(input[2].vPos)) ) { for(int i=0; i<3; i++) { //Lookup optimal position in the RT computed in Step 2 vPos = tClusterMap3.SampleLevel( sClusterMap3, readAddr(input[0].vPos), 0 ); //Output vertex to stream out stream.Append( vPos ); } } return; } • Start with the input mesh • Shown divided into clusters • Compute the quadric map from mesh • Use GS to compute quadric • Accumulate in cluster map, an RT used as large array • For each cluster, compute optimal position • Collapse each vertex to representative • 9x9x9 grid shown

  18. Vertex Clustering Pipeline • Alternate Pass 2: Downsample RTs • Input and Output as before • Computation: • Collapse 8 adjacent cells by adding cluster quadrics • Compute optimal position for 2x larger cell • Create multiple lower levels of detail without repeatedly incurring Pass 1 overhead (~75%) • Pass 3 can use previous streamed-out mesh • Lower levels of detail almost free

  19. Timing Results • Recorded Time Spent in Decimation • GPU: AMD/ATI XXX • CPU: 3Ghz Intel P4 • Significant Improvement over CPU • Averages ~20x speedup on large models • Scales linearly

  20. More Results • Models shown at varying resolutions Buddha, 45x130x45 grid Bunny, 90x90x90 grid Dragon, 100x60x20 grid Models Courtesy of Stanford Graphics Lab

  21. More Results • Models shown at varying resolutions Buddha, 20x70x20 grid Bunny, 60x60x60 grid Dragon, 50x25x10 grid

  22. More Results • Models shown at varying resolutions Buddha, 10x40x10 grid Bunny, 20x20x20 grid Dragon, 30x15x6 grid

  23. Outline • Project Introduction and Motivation • Background • Geometry Shader-based Vertex Clustering • Adaptive Simplification w/ Non-linear Warps • View-dependent Simplification • Region-of-interest Simplification • Probabalistic Octrees on the GPU

  24. View-dependent Simplification • Standard simplification does not consider view • Preserves uniform amount of detail all over • Simplify in post-projection space to use view • Preserves more detail closer to viewer (left) View Direction

  25. Arbitrary Warping Functions • View Transform special case of nonlinear warp • Can use arbitrary warp for adaptive simplification • Regular grids allow data-independence, parallelism • Constant time mapping from position to grid cell • Maps well onto GPU render targets • Forces uniform resolution throughout output mesh • Irregular geometry grids allow non-uniform output • Cells can be larger/smaller in certain regions • Corresponds to lower/greater output triangle density • We lose constant-time mapping of position to cell • Solution: apply inverse warp to vertices • Equivalent to applying forward warp to grid cells • Clustering still performed in uniform grid • Flexibility of irregular geometry w/ speed of regular • One proposal: Gaussian weighting functions

  26. Region-of-Interest Specification • Importance specified w/ biased Gaussian • Highest preservation at mean • Width of region given by sigma • Bias prevents falloff to zero • Integrate to produce corresponding warp function (Derivation given in paper)

  27. Region-of-Interest Specification • Warping allows non-uniform/adaptive level of detail • Head has most semantic importance • Detail lost in uniform simplification • We can warp first to expand center • Equivalent to grid density increasing • Adaptive simplification preserves head detail

  28. Outline • Project Introduction and Motivation • Background • Geometry Shader-based Vertex Clustering • Adaptive Simplification w/ Non-linear Warps • Probabalistic Octrees on the GPU • Motivation • Probablistic Storage • Adaptive Simplification • Randomized Construction • Results

  29. Octrees - Motivation • Basic grid • regular geometry, regular topology • Limitations as we discussed • Warped grid • irregular geometry, regular topology • Much improved; however, we can do better • May be difficult to know required detail a priori • CPU Solution: Multi-resolution grid (i.e. octree) • Irregular topology (irregular geometry w/ warping) • Store grid at many levels of detail • Measure error at each level, use coarse as possible • Efficiency requires dynamic memory, storage O(L3) • Requires O(L) writes to produce correct tree

  30. GPU Solution – Probabilistic Octrees • Proposal • Successful storage not guaranteed, w/ Prob. <= 1 • However, storage failure detected on read • Assumptions allow much flexibility • We can have unlimited depth tree (but lim P=0) • Sparse storage of data • Require conservative algorithms for task • Vertex clustering (conveniently!) is such an example • So is collision detection and frustum culling • Only studied in brief in this paper, we would like to analyze more for future work

  31. Implementation Details • Storage: Spatial Hashes • Map (position,level) to cell, cell hashed to index • Additive blending for quadric accumulation (app-specific) • Max blending to store (key,-key) with data (i.e. min_key,max_key) • Retrieval: • Again map (position, level) to index • Retrieve key value from data, collision iff min_key != max_key • Use parent level, which will have higher storage probability • Usage for Adaptive Simplification • For each vertex, find maximum error level below some threshold • Use this as the representative vertex • Can perform binary search along path • Conservative, because we can maintain validity even when using parent of optimal node (just adds some error)

  32. Probabilistic Octree Results • Adaptive simplification shown on bunny (~4K tris) • Preserves detail around leg, eyes and ears • Simplifies significantly on large, flat regions • Using 8% of storage of total tree, we have < 10% collisions • Only ~20% performance hit vs. standard grids

  33. Conclusions • GS is a powerful tool for interactive graphics • Amplification and decimation are important applications of GS

  34. Geometry Shaders and Other Feature Wish-List • Bring back the Point fill mode • Important for scatter in GPGPU applications • Data amplification improvements with indexed stream out • Avoiding triangle soups very non-trivial • Efficient indexable temps

  35. Thanks a lot! • Various people here…

  36. Questions?

More Related