1 / 66

Mesh Layouts for Block-Based Caches

Mesh Layouts for Block-Based Caches. Sung-Eui Yoon Peter Lindstrom Lawrence Livermore National Laboratory. Goal. Provide cache-coherent layouts of meshes and graphs Derive metrics that measure cache-coherence of layouts Generality Simplicity Efficiency Accuracy. Cache-Coherent Metrics.

nay
Download Presentation

Mesh Layouts for Block-Based Caches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mesh Layouts for Block-Based Caches Sung-Eui Yoon Peter Lindstrom Lawrence Livermore National Laboratory

  2. Goal • Provide cache-coherent layouts of meshes and graphs • Derive metrics that measure cache-coherence of layouts • Generality • Simplicity • Efficiency • Accuracy

  3. Cache-Coherent Metrics • Measure the expected number of cache misses of a layout given block-based caches • Should correlate well with the observed number of cache misses • Cache-aware metrics • Measure cache-coherence given known cache parameters (e.g., block size) • Cache-oblivious metrics • Consider all possible cache parameters

  4. Motivation • Lower growth rate of data access speed 130X Accumulated growth rate during 1993 – 2004 (log scale) 46X 20X 1.5X during 99 - 04 Courtesy: Anselmo Lastra, http://www.hcibook.com/e3/online/moores-law/

  5. Memory Hierarchies and Block-Based Caches Fast memory or cache Slow memory Block transfer Disk CPU 10-2 sec 10-8 sec 10-7 sec Access time:

  6. Main Contributions • Propose novel and practical cache-aware and cache-oblivious metrics • Derive metrics given block-based caches • Propose efficient cache-coherent layout constructions • Apply to different applications

  7. Related Work • Computation reordering • Data layout optimization

  8. Computational Reordering • Cache-aware [Coleman and McKinley 95, Vitter 01, Sen et al. 02] • Cache-oblivious [Frigo et al. 99, Arge et al. 04] Focus on specific problems such as sorting and linear algebra computations

  9. Data Layout Optimization • Graph and matrix layout [Diaz et al. 02] • Minimum linear arrangement (MLA), bandwidth, and wavefront, etc. • Space-filling curves • [Sagan 94, Pascucci and Frank 01, Lindstrom and Pascucci 01, Gopi and Eppstein 04] • Rendering and processing sequences • [Deering 95, Hoppe 99, Bogomjakov and Gotsman 02, Isenburg and Lindstrom 05] • Cache-oblivious mesh layout • [Yoon et al. 05]

  10. Outline • Computation models • Cache-aware and cache-oblivious metrics • Results

  11. Outline • Computation models • Cache-aware and cache-oblivious metrics • Results

  12. General Framework of Layout Computation na Input directed graph, G (N, A) nb nd nc Cache-coherent metric Layout algorithm, φ nd nb na nc …….. 1D layout, φ(N)

  13. nd na nc Two-Level I/O Model [Aggarwal and Vitter 88] na Input directed graph nb nd nc M cache blocks, whose size is B Layout algorithm nb Cache nd nb na nc …….. 1D layout with block size = 3

  14. Graph Representation • Directed graph, G = (N, A) • Represent access patterns between nodes • Nodes, N • Data element • (e.g., mesh vertex or mesh triangle) • Directed arcs, A • Connects two nodes if they are accessed sequentially na nb nd nc

  15. Weights of Nodes and Arcs • Indicate probabilities that each element will be accessed • Computed in an equilibrium status during infinite random walks • Assume that applications infinitely access the data according to the input graph • Correspond to eigen-values of the probability transition matrix

  16. Cache-Coherence of a Layout given Block-Based Caches • Expected number of cache misses of a layout • Probability accessing a node from another node by traversing an arc • Conditional probability that we will have a cache miss given the above access pattern na nb nd nc

  17. Specialization to Meshes • Expected number of cache misses of a layout • Probability accessing a node from another node by traversing an arc • Conditional probability that we will have a cache miss given the above access pattern = constant na na 1. Two opposite directed arcs 2. Uniform distribution to access adjacent nodes given a node nb nd nb nd nc nc Implicitly derived graph An input mesh

  18. Outline • Computation models • Cache-aware and cache-oblivious metrics • Results

  19. Four Different Cases Cache-aware case single cache block, M=1 Cache-oblivious case single cache block, M=1 Cache-aware case multiple cache blocks, M>1 Cache-oblivious case multiple cache blocks, M>1

  20. Cache-Aware: Single Cache Block, M=1 na Input directed graph nb nd nc Straddling arcs Cache, whose block size is B nd nb na nc …….. 1D layout with block size = 3

  21. Cache-Aware: Multiple Cache Blocks, M>1 na Input directed graph nb nd nc Straddling arcs Cache nd nb na nc …….. 1D layout with block boundary

  22. Final Cache-Aware Metric • Counts the number of straddling arcs of the layout given a block size B : block index containing the node, i where : Unit step function, 1 if x > 0 0 otherwise.

  23. High Accuracy of Cache-Aware Metric Tested block size = 4KB Z-curve on a uniform grid Tested layouts: Z-curve, Hilbert curve, H-order, minimum linear arrangement layout, βΩ-layout, geometric CO layout, (bi or uni) row-by-row, (bi or uni) diagnoal layouts

  24. Four Different Cases Cache-aware case single cache block, M=1 Cache-oblivious case single cache block, M=1 Cache-aware case multiple cache blocks, M>1 Cache-oblivious case multiple cache blocks, M>1

  25. Cache-Oblivious: Single Cache Block, M=1 Does not assume a particular block size: Then, what are good representatives for block sizes? Cache

  26. Two Possible Block Size Progressions • Arithmetic progression • 1, 2, 3, 4, … • Geometric progression • 20 , 21 , 22 ,23 , … • Well reflects current caching architectures • E.g., L1: 32B, L2: 64B, Page: 4KB, etc.

  27. Probability that an Arc is a Straddling Arc Computed as a probability as a function of arc length,l Is an arc straddling given a block size? Arc length, l, = 2 nd nb na nc ……..

  28. Two Cache-Oblivious Metrics • Arithmetic cache-oblivious metric, • Geometric cache-oblivious metric, MLA metric, Arithmetic mean Arc length of arc (i, j) Geometric mean of arc lengths

  29. Validation for Cache-Oblivious (CO) Metrics 73% of tested power-of-two block sizes 97% of tested block sizes • Geometric cache-oblivious metric • Practical and useful The number of cache misses when M = 1 (log scale) Geometric CO layout Arithmetic CO layout

  30. Correlations between Metrics and Observed Number of Cache Misses Tested block size = 4KB Tested with 10 different layouts on a uniform grid

  31. Efficient Layout Computation for Our Metrics • Cache-aware layouts • Optimized with cache-aware metric given a block size B • Computed from the graph partitioning • Geometric cache-oblivious metric • Very efficient • Can be used in different layout methods

  32. Layout Computation with Geometric Cache-Oblivious Metric • Multi-level construction method • Partition an input mesh into k different sets • Layout partitions based on our metric • Generalized layout method • for unstructured meshes 1. Partition 2. Lay out

  33. Outline • Computation models • Cache-aware and cache-oblivious metrics • Results

  34. Applications • Isosurface extraction • View-dependent rendering

  35. Iso-Surface Extraction • Uses contour tree [van Kreveld et al. 97] • Runtime is dominated by the traversal of iso-surface • Layout graph • Use an input tetrahedral mesh Spx model (140K vertices)

  36. High Correlation with Number of Cache Misses Tested block size = 4KB Tested with 8 different layouts: our geometric CO, our cache-aware, breadth-first (and depth-first) layouts, spectral [Juvan and Mohar 92], cache-oblivious mesh [Yoon et al. 05], Z-curve [Sagan 94], X-axis sorted layouts

  37. High Correlation with Runtime Performance Memory access time is major bottleneck Disk I/O time is major bottleneck

  38. Comparison with Other Layouts The first iso-surface extraction time (sec) 8% - 77% improvement and very close to the cache-aware performance

  39. View-Dependent Rendering • Layout vertices and triangles of progressive meshes • Used in an efficient VDR system [Yoon et al. 04] • Reduce misses in GPU vertex cache

  40. Cache Miss Ratio on Bunny Model Universal rendering seq. [Bogomjakov and Gotsman 02] GPU vertex cache miss ratio Hoppe [Hoppe 99] Theoretical lower bound [Bar-Yehuda and Gotsman 96] Geometric CO layout Vertex cache size

  41. Cache Miss Ratio on Power Plant Model GPU vertex cache miss ratio Z-curve COML [Yoon et al. 05] Hoppe’s rendering seq. [Hoppe 99] Theoretical lower bound [Bar-Yehuda and Gotsman 96] Geometric CO layout Vertex cache size

  42. Conclusion • Novel cache-aware and cache-oblivious metrics to evaluate layouts • Derived metrics based on two-level I/O model • Improved the performance of applications without modifying codes OpenCCL, open source library http:://gamma.cs.unc.edu/COL/OpenCCL

  43. Ongoing and Future Work • Derive a lower bound on our geometric cache-oblivious metric • Employ mesh compression to further reduce disk I/O accesses • Investigate efficient layout method for deforming models • Apply to non-graphics applications • e.g., shortest path or other graph computations

  44. Cache-Efficient Layouts of Bounding Volume Hierarchies • Yoon and Manocha, Eurographics 06 Collision detection Ray tracing

  45. Acknowledgements • Ajith Mascarenhas • Martin Isenburg • Dinesh Manocha • Fabio Bernardon, Joao Comba, and Claudio Silva • For their unstructured tetrahedra rendering program • Members of data analysis group in LLNL • Anonymous reviewers

  46. UCRL-PRES-225448 This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-ENG-48.

  47. Additional slides

  48. Cache-Coherence of Layouts • Well known heuristics for cache-coherent layouts • Space-filling curves [Sagan 94] • How can we compute better layouts? • Requires metrics measuring cache-coherence of layouts

  49. Main Results • Define cache-coherence of layout as: • Expected number of cache misses during random walks of a graph given block-based caches • Then, the exp. number of cache misses • Number of straddling arcs in a cache-aware cache • Geometric mean of arc lengths in a cache-oblivious case

  50. Data Layout Optimization • Rendering sequences • Triangle strips • [Deering 95, Hoppe 99, Bogomjakov and Gotsman 02] • Processing sequences • [Isenburg and Gumhold 03, Isenburg and Lindstrom 05] Assume that access pattern globally follows the layout order

More Related