320 likes | 413 Views
Enhancing and Optimizing the Render Cache. Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg Cornell Program of Computer Graphics. Background. Render Cache
E N D
Enhancing and Optimizing the Render Cache Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg Cornell Program of Computer Graphics
Background • Render Cache • “Interactive Rendering using the Render Cache”, Rendering Workshop 1999 • Goal • Interactive Rendering • Exploit frame-to-frame coherence • Decouple renderer from display framerate • Reuse “expensive” rendering results
Background • Goal: Interactive rendering Ray tracing Path tracing
image renderer display user application Background • Modified Visual • Feedback Loop Asynchronous interface
Background • Reproject rendered points Original view New view
Background renderer Displayprocess Update Points Project/Z-Buffer DepthCull image Interpolate Sampling renderer
Background • Results after each stage Projection Depth cull Interpolation
Background • Sampling Displayed image Priority image Requested pixels
Related Work • Faster ray engines • Optimize and parallelize • E.g., Wald et al • Hardware-based display • Mesh-based • E.g., Tapestry, Holodeck, Tole et al • Texture-based • E.g., Corrective textures
Motivation • Render Cache works well • Can enable interactive use of higher quality ray-based renderers. • … but needs improvement • Images too small (256x256) • Gaps often visible during camera motion • Not fast enough in tracking shading changes
Enhancements • Tiled Z-Buffer • Better scalability and memory coherence • Larger Interpolation Prefilter • Can fill larger gaps between points • Predictive Sampling • Improved quality during camera motion • Point Eviction • Faster update of shading changes
Enhancements • Code Optimization • Use of SIMD (MMX/SSE/SSE2) • Data layout, branch conversions, etc. • Publicly Available • For evaluation, comparison, or use • Non-commercial binary release • URL is in the paper
Memory Coherence • Change from R10K to Pentium 4 • Cache reduced from 4MB to 256K • Clock increased from 195MHz to 1.7GHz • Cache misses much more expensive • Change from 256x256 to 512x512 • Point data ~ 5MB, Image data ~ 3MB • Much bigger than cache • Projection and Z-Buffer problematic
Projection and Z-Buffer • Random order memory access • Read/modify/write operation is memory latency limited Point Cloud 5MB Image - 3MB
Tiled Projection and Z-Buffer • Divide image into tiles • Tiles sized to fit in cache Point Cloud 5MB Tile Buckets - 4MB Image - 3MB
Tiled Projection and Z-Buffer • Project and bucket sort by tile Point Cloud 5MB Tile Buckets - 4MB Image - 3MB
Tiled Projection and Z-Buffer • Z-Buffer each tile separately Point Cloud 5MB Tile Buckets - 4MB Image - 3MB
Tiled Projection and Z-Buffer • Uses more memory and instructions • But it is faster (25ms instead of 42ms) Point Cloud 5MB Tile Buckets - 4MB Image - 3MB
Interpolation Filters • Larger filters • Fill larger gaps in point data • Generally more expensive • Result in more blurring of the image • The previous Render Cache • Used a 3x3 weighted filter • Can only fill very small gaps • Introduces only a small amount of blurring
Prefilter • Add a larger “backup” filter • Results used only when 3x3 filter fails • Uses a uniform 7x7 filter • Can be computed cheaply • Can fill in much larger gaps • Does not affect sampling priorities • Actually executed first then overwritten • Hence the name “prefilter”
Prefilter 3x3 filter only 7x7 prefilter only Both filters
Predictive Sampling • Sampling is purely reactive • Helps to guide sparse sampling • Samples returned in later frame • Problem when large new regions become visible • Predict large gaps ahead of time • Project using a predicted camera • Request samples before they are needed
Predictive Sampling • Projection is expensive • 47% of original render cache cost • Use simplified projection • No Z-Buffer • Only need to find regions with no points • Reduced resolution • 1/4 width and height (1/16 # of pixels) • Store only 1 byte per pixel • Occupancy image fits easily in cache
Predictive Sampling • Example during rapid camera rotation No Prediction With Prediction
Algorithm Overview Update Points renderer Prediction Project/Sort Z-Buffer DepthCull image Prefilter Interpolate Sampling renderer
Point Eviction • Stale data can be worse than no data • Points may live a long time at high ratios • Not enough new samples to overwrite old • Color change detection already exists • Enhances sampling in regions of change • Works by aging nearby points • Evict points beyond an age limit • Speeds image convergence
SIMD Optimizations • Utilize MMX/SSE/SSE2 instructions • Project four points at once • Process R,G,B channel simultaneously • Add memory prefetches • Automatic prefetch works well for linear access • Convert branches to data dependencies • Compares set masks of zeroes or ones • Use boolean operations instead of branches • Roughly a factor of two total speedup
Results • Single 1.7GHz processor - rotating camera Ray trace only (1.8 fps) Render Cache (9 fps)
S a m p l i n g U p d a t e P o i n t s F i l t e r / S m o o t h P r e d i c t i o n P r e f i l t e r D e p t h C u l l P r o j e c t Z - B u f f e r Results • Timing: 62.1 ms (up to 16 fps) • 512x512 image, render cache only • 1.7GHz Pentium 4 processor
Scalability with Image Size 1600000 1200x1200 1400000 1200000 1000000 800000 600000 Frame Size (Pixels) 400000 512x512 200000 0 0 50 100 150 200 250 300 350 Frame Time (ms)
Results • Try it for yourself • Download publicly available binary • Includes Render Cache and simple Ray Tracer • Requires a Pentium 4 and Java Web Start • Free for evaluation and internal use • Http://www.graphics.cornell.edu/research/interactive/rendercache • Demo