90 likes | 229 Views
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform. Mario Rincón-Nigro PhD Showcase Feb 17 th , 2012. Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform. Background. Heterogeneous Computing Platforms Widely available at all scales Ray Tracing
E N D
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Mario Rincón-Nigro PhD Showcase Feb 17th, 2012
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Background • Heterogeneous Computing Platforms • Widely available at all scales • Ray Tracing • Most popular technique for photorealism • Base of many rendering algorithms • Computationally intensive • Embarrassingly parallel
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Background: BVH Ray Traversal 7 boxes + 15 triangles tests 5 boxes + 9 triangles tests • Workload of ray tracing is irregular • Per ray BVH traversals are highly variable • Cost is hard to know beforehand • Dynamic workload balancing • What about if we could predict the traversal costs? • How to use them to improve the balancing efficiency and reduce rendering times?
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Overview of the Approach Offline • Build a BVH for the scene • Compute expected number of primitive intersections Online • Predict costs of batch of rays • Initialize workload balancerbased on predicted costs • Heterogeneous launch of ray tracer • Repeat for generations of secondary rays
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Ray Traversal Cost Estimation (4) EB = 4.2 ET =9.65 20 EB = 3.33 ET = 4.89 EB = 2 ET = 8.75 18 12 EB = 0 ET = 8 EB = 2 ET = 4 12 5 6 7 EB = 0 ET = 7 EB = 0 ET = 9 Boundary 6 6 EB = 0 ET = 4 EB = 0 ET = 4 8 7 9 In this example C(r) = 7 KB + 15 KT 4 4 We traverse the BVH to 60% of its depth and sample 10% of the rays within each task
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Workload Balancing • Task for ray tracing are fixed size group of rays • Two-level workload balancing • Inter-processor: need to split work between “big” processing units (CPUs, GPUs) • Intra-processor: need to split work between “small” processing units (SPs within a GPU) • A variation of one of these strategies is commonly used: • Centralized queue • Distributed static assignation • Distributed dynamic assignation with task stealing
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Balancing Strategies Distributed Queues Static Balancing Distributed Queues with Task Stealing Dynamic Balancing Centralized Queue
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Experiments • Test platform • AMAX machine running Intel Xeon E5600 processor, 16GB of RAM memory, and 3 NVidiaTesla C1060 GPUs. • We have compared regular and cost-initialized versions of the workload balancing policies over a number of test scenes • For task of varying size • For rays exhibiting different degree of spatial coherency (high to medium) • In general, cost-based initialized versions outperform regular versions for large task sizes • Results not sensitive to degree of spatial coherency of tested rays
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Conclusions and Future Work • Cost-based initialization of task system can improve balancing efficiency of strategies. • The most benefit is gotten by the static strategy (comparable to dynamic balancing) • Dynamic strategies also showed improved balancing efficiency • Cost based approach is particularly attractive for coarse grained task systems. • Best results achieved for large size tasks • Work limited by degree of variability that rays can have. Rays with low spatial coherency pose a challenge due to the estimation overhead they impose. • Approach cannot be used in its current state for some rendering algorithms • We believe that fast ray reordering can help in this regard • We have not considered yet using directly the costs for GPU workload balancing • An implementation of distributed queues on GPUs might also get some benefit from the estimated costs