1 / 14

Data Locality & ITs Optimization Techniques

Data Locality & ITs Optimization Techniques. Presented by Preethi Rajaram. CSS 548 Introduction to Compilers Professor Carol Zander Fall 2012 . Why?. Processor Speed - increasing at a faster rate than the memory speed

prem
Download Presentation

Data Locality & ITs Optimization Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Locality & ITsOptimizationTechniques Presented by Preethi Rajaram CSS 548 Introduction to Compilers Professor Carol Zander Fall 2012

  2. Why? • Processor Speed - increasing at a faster rate than the memory speed • Computer Architectures -more levels of cache memory • Cache - takes advantage of data locality • Good Data Locality - good application performance • Poor Data Locality - reduces the effectiveness of the cache

  3. Data Locality • It is the property that, references to the same memory location or adjacent locations are reused within a short period of time • Temporal locality • Spatial locality Fig: Program to find the squares of the differences (a) without loop fusion (b) with loop fusion [Image from: The Dragon book 2ndedition]

  4. Matrix Multiplication - Example Fig: Basic Matrix Multiplication Algorithm [Image from: The Dragon book 2ndedition] • Poor data locality • N2 multiply add operations separates the reuse of same data element in matrix Y • N operations separate the reuse of same cache line in Y • Solutions • Changing the layout of the data structures • Blocking

  5. Matrix Multiplication – Example Contd… • Changing the data structure layout • Store Y in column-major order • Improves reuse of cache lines of matrix Y • Limited Applicability • Blocking • Changes the execution order of instructions • Divide the matrix into submatrices or blocks • Order the operations such that entire block is used over a short period of time • Choose B such that, one block from each of the matrices fits into cache • Image from: The Dragon book 2nd edition

  6. Data Reuse • Locality Optimization • Identify set of iterations that access the same data or same cache line • Static Access- an instruction in a program e.g x = z[i,j] • Dynamic Access- execution of instruction many times as in a loop nest • Types of Reuse • Self • Iterations using same data come from same static access • Group • Iterations using same data come from different static access • Temporal • If the same exact location is referenced • Spatial • If the same cache line is referenced

  7. Self Temporal Reuse • Save substantial memory by exploiting self reuse • n(d-k) times reused for data with ‘k’ dimensions in a loop nest of depth ‘d’ e.g. 3-deep nested loop accesses one column of an array, then there is a potential saving accesses of n2 accesses • Dimensionality of access- Rank of the matrix in access • Iterations referring to the same location – Null Space of a matrix • Rank of a Matrix • No. of rows or columns that are linearly independent • Null Space of a matrix • A reference in ‘d’ deep loop nest with ‘r’ rank, accesses O(nr) data elements in O(nd) iterations, so on an average, O(nd-r) iterations must refer to the same array element Nullity = 3-2 = 1 Loop depth = 3 Rank = 2 Rank = Dimensionality = 2 2nd row = 1st + 3rd 4th row = 3rd – 2* 1st

  8. Self Spatial Reuse • Depends on data layout of the matrix – e.g. Row major order • In an array of ‘d’ dimension, array elements share a cache line if they differ only in the last dimension e.g. Two array elements share the same cache line if and only if they share the same row in a 2-D array • Truncated matrix is obtained by dropping of the last row from the matrix • If the resulting matrix has a rank ‘r’ that is less than depth ‘d’, we can assure for spatial reuse Truncated Matrix, r = 1, d = 2 r<d, assures spatial reuse

  9. Group Reuse • Group reuse only among accesses in a loop sharing the same coefficient matrix Fig: 2-deep loop nest [Image from: The Dragon book 2ndedition] • z[i,j] and z[i-1,j] access almost the same set of array elements • Data read by access z[i-1,j] is same as the data written by z[i,j], except for i = 1 Rank = 2, no self temporal reuseTruncated Matrix, Rank = 1, self spatial reuse

  10. Locality Optimization • Temporal Locality of data Use the results as soon as they are generated Fig: Code excerpt for a multigrid algorithm (a) before partition (b) after patition [Image from: The Dragon book 2ndedition]

  11. Locality Optimization Contd… • Array Contraction Reduce the dimension of the array and reduce the number of memory locations accessed Fig: Code excerpt for a multigrid algorithm after partition and after array contraction Image from: The Dragon book 2nd edition

  12. Locality Optimization Contd… • Instead of executing each partition one after the other; we interleave a number of the partitions so that reuse among partitions occur close together • Interleaving Inner Loops in a Parallel Loop • Interleaving Statements in a Parallel Loop Fig: Interleaving four instances of the inner loop [Image from: The Dragon book 2ndedition] Fig: The statement interleaving transformation [Image from: The Dragon book 2ndedition]

  13. References • Wolf, Michael E., and Monica S. Lam. "A data locality optimizing algorithm." ACM Sigplan Notices 26.6 (1991): 30-44. • McKinley, Kathryn S., Steve Carr, and Chau-Wen Tseng. "Improving data locality with loop transformations." ACM Transactions on Programming Languages and Systems (TOPLAS) 18.4 (1996): 424-453. • Bodin, François, et al. "A quantitative algorithm for data locality optimization." Code Generation: Concepts, Tools, Techniques (1992): 119-145. • Kennedy, Ken, and Kathryn S. McKinley. "Optimizing for parallelism and data locality." Proceedings of the 6th international conference on Supercomputing. ACM, 1992. • Compilers ‐ Principles, Techniques, and Tools by A. Aho, M. Lam (2nd edition), R. Sethi, and J.Ullman, Addison‐Wesley.

  14. Thank You! Questions??

More Related