1 / 44

A Hardware Processing Unit For Point Sets

Explore the innovative architecture for point sets, neighbor search module, and advanced caching mechanism powered by FPGA technology. Achieve efficient processing, manipulation, and rendering of point-based graphics.

Download Presentation

A Hardware Processing Unit For Point Sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud,M. Botsch, M. Gross Graphics Hardware 2008

  2. Motivation • Point-based graphics established • Powerful algorithms • Representation • Processing • Manipulation • Rendering • Decomposition • Get neighborhood • Operate on neighbors Graphics Hardware 2008

  3. Motivation • GPUs not suited for getting neighborhood • SIMD • Incoherent branching • Dynamic data structures slow • Recursive calls not supported • CPUs • Small number of FPUs • Inflexible memory caches Courtesy of NVIDIA Courtesy of Intel Graphics Hardware 2008

  4. Contributions • Hardware architecture for point sets • Neighbor search module • Novel advanced caching mechanism • Reconfigurable processing module • Programmability using FPGA compiler • FPGA prototype and measurements • Small & Lean  Integration into multi-core CPU/GPU possible Graphics Hardware 2008

  5. Outline • Related Work • Spatial Searching and Caching • Architecture and Prototype • Results • Conclusion Graphics Hardware 2008

  6. Related Work Kd-Tree [Bentley 75] kNN on GPUs[Ma and McCool 02] Kd-Tree on GPUs [Popov et al. 07] Kd-Tree Hardware [Woop et al. 05] [Woop et al. 06] Graphics Hardware 2008

  7. Related Work Adaptive SPH Fluid Simulation [Adams et al. ‘07] Algebraic Moving Least Squares, [Guennebaud and Gross ‘07] Linear Moving Least Squares, [Adamson and Alexa ’04] Graphics Hardware 2008

  8. Linear Moving Least Squares • Implicit surface definition defined by set of points Graphics Hardware 2008

  9. Linear Moving Least Squares • Implicit surface definition defined by set of points x Graphics Hardware 2008

  10. 10 Linear Moving Least Squares ni pi x Graphics Hardware 2008

  11. Linear Moving Least Squares • Iterative projections onto plane x Graphics Hardware 2008

  12. Linear Moving Least Squares • Iterative projections onto plane x’ x ’ Graphics Hardware 2008

  13. Linear Moving Least Squares • Iterative projections onto plane x’’ x ’ ’ Graphics Hardware 2008

  14. Linear Moving Least Squares • Iterative projections onto plane x’’’ x ’ ’ ’ Graphics Hardware 2008

  15. Linear Moving Least Squares • Surface defined by points projecting onto themselves x Graphics Hardware 2008

  16. Outline • Related Work • Spatial Searching and Caching • Architecture & Prototype • Results • Conclusion Graphics Hardware 2008

  17. Spatial Search • Spatial search: kNN and eNN • Common in most point operations • Based on kd-tree • Example eNN: Graphics Hardware 2008

  18. Spatial Search • kNN search similar to eNN search: • Start with infinite radius • Sort leaf points into priority queue • Shrink radius with every point sorted Graphics Hardware 2008

  19. Coherent Neighbor Cache(eNN) • Find neighbors in slightly bigger radius • Re-use result for spatially close query Re-use if Graphics Hardware 2008

  20. Coherent Neighbor Cache(kNN, exact) • Find (k+1) neighbors • Re-use result for spatially close query Re-use if Graphics Hardware 2008

  21. Coherent Neighbor Cache(kNN, approximation) • Approximation error e • Enlarge radius Re-use if Graphics Hardware 2008

  22. Outline • Related Work • Spatial Searching and Caching • Architecture & Prototype • Results • Conclusion Graphics Hardware 2008

  23. The Architecture Host Graphics Hardware 2008

  24. Coherent Neighbor Cache 0 0 0 1 1 1 n n n • Eight cached neighborhoods • Problem: parallel queries in kd-tree module •  Interleave spatially similar queries Graphics Hardware 2008

  25. Kd-Tree Traversal Graphics Hardware 2008

  26. NodeRecurse • Kd-tree structure on chip • 16 threads • Pipelining and multi-threading Graphics Hardware 2008

  27. Stacks • 16 stacks • Parallel read/write • Bounded in depth • 6 bytes per thread per recursion Graphics Hardware 2008

  28. Leaf • 16 parallel priority queues (1-cycle ops) • Queues store pointers and distances • Bandwidth bottleneck Graphics Hardware 2008

  29. Processing Module • Multithreaded quad-port bank of 16 registers • 128 threads • Programmability using FPGA-technology Graphics Hardware 2008

  30. Further Data • Implemented on two FPGAs • 64 bit DDR DRAM • Interconnection: no overhead • Resource usage regs and LUTs • Virtex 2 Pro 100 (kNN): 26% registers, 38% LUTs • Virtex 2 Pro 70 (MLS):47% registers, 52% LUTs • Clock frequency: 75 MHz Graphics Hardware 2008

  31. Outline • Related Work • Spatial Searching and Caching • Architecture & Prototype • Results • Conclusion Graphics Hardware 2008

  32. Applications • Tested on various applications • PCI interface of prototype slow • [Weyrich et al. 04] • [Adams et al. 07] Graphics Hardware 2008

  33. Results kNN 75 MHz 2200 MHz 1200 MHz CUDA: x4 ASIC estimate, 500 MHz x6.6 Number of queries CUDA w/o sort: x4.0 CPU: x1.5 CUDA: x2.4 CUDA w/o sort: x3.1 CPU: x1.4 CUDA: x1.6 FPGA: x1 CPU: x1.1 FPGA: x1 FPGA: x1 Number of Neighbors Graphics Hardware 2008

  34. Results kNN • Small hardware footprint • FPGA slightly slower • Realistic clock frequency  Prototype faster than CPU/GPU 75 MHz 2200 MHz 1200 MHz CUDA: x4 ASIC estimate, 500 MHz x6.6 Number of queries CUDA w/o sort: x4.0 CPU: x1.5 CUDA: x2.4 CUDA w/o sort: x3.1 CPU: x1.4 CUDA: x1.6 FPGA: x1 CPU: x1.1 FPGA: x1 FPGA: x1 Number of Neighbors Graphics Hardware 2008

  35. Results MLS FPGA faster than CPU 75 MHz 2200 MHz 1200 MHz Number of queries MLS CUDA x3.8 • kNN bottleneck • FPGA • GPU FPGA: x1 MLS CPU: x0.4 Number of Neighbors Graphics Hardware 2008

  36. Coherent Neighbor Cache CPU, e=0.1 Number of queries FPGA, e=0.1 FPGA, exact Level of coherence Graphics Hardware 2008

  37. Results Approximation Error (MLS projection) MLS Error e approximation no approx. Graphics Hardware 2008

  38. Results Approximation Error (MLS projection) Cache hits Cache Hits e approximation Graphics Hardware 2008

  39. Approximation Error (visual) Graphics Hardware 2008

  40. Approximation Error (visual) • Coherent Neighbor Cache: • Not optimal for exact queries • Approximate queries • Can be tolerated in most cases • Greatly increases performance • Even for small approximations Graphics Hardware 2008

  41. Outline • Related Work • Spatial Searching and Caching • Architecture & Prototype • Results • Conclusion Graphics Hardware 2008

  42. Conclusion • Novel hardware architecture for • Nearest-neighbor searches • Generic meshless processing operators • Cache exploiting spatial coherence • Good performance considering resources • Possible GPU integration Graphics Hardware 2008

  43. Future Work • Programmable data structure • Support different data structures • Programmability in data structure • Construction on-chip • ‘Real’ programmability in point processing module Graphics Hardware 2008

  44. A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud,M. Botsch, M. Gross Graphics Hardware 2008

More Related