190 likes | 307 Views
Efficient Parallel Refinement for Hierarchical Radiosity on a DSM computer. François X. Sillion, Jean-Marc Hasenfratz iMAGIS. Radiosity. Hierarchical Radiosity. Hierarchical representation (mesh) Interactions computed at appropriate level. Strategies for Hierarchical Radiosity. Gathering
E N D
Efficient Parallel Refinementfor Hierarchical Radiosity on a DSM computer François X. Sillion, Jean-Marc Hasenfratz iMAGIS
Hierarchical Radiosity • Hierarchical representation (mesh) • Interactions computed at appropriate level
Strategies for Hierarchical Radiosity • Gathering • memory consuming (store links) • Easier dynamic modifications • Shooting • Memory efficient • Requires heuristic to decide shooting level • Links recomputed as needed
Parallel Approaches • Two approaches: • data exchange via message-passing algorithms • Shared memory • Partial solutions possible if “natural” partitioning exists (e.g. inside buildings) [Fun96,FY97] • Virtual interfaces are harder to handle [RAPP97] • Load balancing problem[Cav99]
Scheduler • Force all link refinement operations through a scheduler object • Natural place for • Parallel synchronization • Orientation and steering of calculation • Advantages of using scheduler: • Global view of all pending task at any given time • Task extraction can be made according to various selection criteria
Example (sequential) schedulers • Stack scheduler (depth first refinement) • Priority scheduler • Use simple structure (heap) • Hierarchical level (breadth first) • Size, energy, error • Interactive user control • Random scheduler...
Refiner Refiner Refiner Refiner Refiner Refiner Refiner Refiner Refiner Refiner Refiner … Architecture Main / GUI Solver Scheduler
Synchronization • Scheduler • Single object talks to all refiners => Danger! • Use simple blocks of refinement jobs • Hierarchical data structure • Consistency of hierarchical scene structure • Interactions • Links or energy representations
Test scenes Aircraft - 184 456 polygons VRLab - 51 182 polygons Office - 5 285 polygons
Node 0 Proc A Proc B Node 1 Node 511 … Mem & Dir Hub Chip IO Xbar R R R R R R R R IO Ctrls Scalable Interconnect Network R R R R R R R R Measurements • Hardware architecture: • ccNUMA SGI 2000 computer with 64 microprocessors • Limit to 40 microprocessors R10000 at 195MHz
Measurements • Time measurements: • Refinement: times system call which return clock ticks • Memory access, cache access…: perfex software tool which uses the 31 hard counters of R10,000
Results CPU Refinement time
Results Speed-up
Results Influence of the size of link blocks on overall CPU time
Results Memory used before and during the iterations
Conclusions • Very simple atomic tasks • Easily managed with a single scheduler structure • Easily implemented on top of an existing radiosity simulation code • Thread setup • New link creation upon refinement decision
Future work • Understanding the peculiar behaviour observed for the aircraft scene • Dealing with graphics resources for “optimized” calculations using graphics hardware
Acknowledgements • Peter Kipfer contributed to the design and early implementation of this work. • Thanks to Centre Charles Hermite for providing access to its computational resources • Laurent Alonso provided useful advice on performance questions. • This work was supported in part bythe European Union’s ESPRIT project #24944, ARCADE (“Making Radiosity Usable”).