1 / 16

Extensions to Structure Layout Optimizations in the Open64 Compiler

Extensions to Structure Layout Optimizations in the Open64 Compiler. Michael Lai AMD. Related Work. Structure splitting, structure peeling, structure field reordering ( Hagog & Tice, Hundt , Mannarswamy & Chakrabarti ) Above implemented in the Open64 Compiler ( Chakrabarti & Chow)

damisi
Download Presentation

Extensions to Structure Layout Optimizations in the Open64 Compiler

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extensions to Structure Layout Optimizations in the Open64 Compiler Michael Lai AMD

  2. Related Work • Structure splitting, structure peeling, structure field reordering (Hagog & Tice, Hundt, Mannarswamy & Chakrabarti) • Above implemented in the Open64 Compiler (Chakrabarti & Chow) • Structure instance interleaving (Truong, Bodin & Seznec) • Data splitting (Curial, Zhao & Amaral) • Array reshaping (Zhao, Cui, Gao, Silvera & Amaral)

  3. Current Framework source source source frontend frontend frontend WHIRL WHIRL WHIRL ipl ipl ipl .o .o .o ipa_link WHIRL WHIRL

  4. Instance Interleaving a[0].field_1 a[0].field_2 an “instance” of the structure a[0].field_3 a[1].field_1 a[1].field_2 another “instance” of the structure a[1].field_3 ...

  5. Instance Interleaving a[0].field_1 field_1 of all the instances are a[1].field_1 interleaved together … a[0].field_2 field_2 of all the instances are a[1].field_2 interleaved together … a[0].field_3 field_3 of all the instances are a[1].field_3 interleaved together ...

  6. Instance Interleaving array[0].field_1 array[0].field_2 array[0].field_3 … array[0].field_m array[1].field_1 array[1].field_2 array[1].field_3 … array[1].field_m … array[n-1].field_1 array[n-1].field_2 array[n-1].field_3 … array[n-1].field_m array[0].field_1 array[1].field_1 array[2].field_1 … array[n-1].field_1 array[0].field_2 array[1].field_2 array[2].field_2 … array[n-1].field_2 … array[0].field_m array[1].field_m array[2].field_m … array[n-1].field_m

  7. Implementation • Profitability analysis (done in ipl) • During ipl compilation of each source file, access patterns of structure fields are analyzed and their usage statistics recorded • After all the functions have been compiled by ipl, the “most likely to benefit” structure (if any) is marked and passed to ipo • (By way of illustration, the ideal structure is one with many fields, each of which appearing in its own hot loop)

  8. Implementation • Legality analysis (done in ipo) • Usual checking for address taken, escaped types, etc. • Code transformation (done in ipo) • Create internal pointers ptr_1, ptr_2, …, ptr_m to keep track of the m locations array[0].field_1, array[0].field_2, …, array[0].field_m • Rewrite array[i].field_j to ptr_j[i], if “i” is known; otherwise, incur additional overhead to compute “i”

  9. Instance Interleaving array[0].field_1 array[0].field_2 array[0].field_3 … array[0].field_m array[1].field_1 array[1].field_2 array[1].field_3 … array[1].field_m … array[n-1].field_1 array[n-1].field_2 array[n-1].field_3 … array[n-1].field_m array[0].field_1 array[1].field_1 array[2].field_1 … array[n-1].field_1 array[0].field_2 array[1].field_2 array[2].field_2 … array[n-1].field_2 … array[0].field_m array[1].field_m array[2].field_m … array[n-1].field_m = ptr_1 = ptr_2 = ptr_m array[i].field_j becomes ptr_j[i]

  10. Array Remapping field_1 field_2 field_3 … field_m field_1 field_2 field_3 … field_m … field_1 field_2 field_3 … field_m a[0] a[1] a[2] … a[m-1] a[m] a[m+1] a[m+2] … a[2m-1] … a[(n-1)m] a[(n-1)m+1] a[(n-1)m+2] … a[nm-1] a[0] a[1] a[2] … a[n-1] a[n] a[n+1] a[n+2] … a[2n-1] … a[(m-1)n] a[(m-1)n+1] a[(m-1)n+2] … a[mn-1] field_1 field_1 field_1 … field_1 field_2 field_2 field_2 … field_2 … field_m field_m field_m … field_m iteration 0 iteration 1 iteration 2 iteration 0 iteration n-1 iteration 0 iteration 1 iteration 1 iteration 2 iteration n-1 iteration 0 iteration 1 iteration 2 iteration n-1 iteration n-1

  11. Implementation • Profitability analysis (done in ipl) • During ipl compilation of each source file, discover if there are arrays that behave like structures and suffer poor data cache utilization at the same time • After all the functions have been compiled by ipl, the “most likely to benefit” arrays (if any) are marked and passed to ipo • For each of these arrays, record the stride, group size, and array size associated with it

  12. Implementation • Legality analysis (done in ipo) • Check for array aliasing, address taken, argument passing, etc. • Code transformation (done in ipo) • Construct the array remapping permutation alpha(i) = (i % m) * n + (i / m), where m is the group size and n is the number of such groups • Rewrite a[i] to a[alpha(i)]

  13. Array Remapping field_1 field_2 field_3 … field_m field_1 field_2 field_3 … field_m … field_1 field_2 field_3 … field_m a[0] a[1] a[2] … a[m-1] a[m] a[m+1] a[m+2] … a[2m-1] … a[(n-1)m] a[(n-1)m+1] a[(n-1)m+2] … a[nm-1] a[0] a[1] a[2] … a[n-1] a[n] a[n+1] a[n+2] … a[2n-1] … a[(m-1)n] a[(m-1)n+1] a[(m-1)n+2] … a[mn-1] field_1 field_1 field_1 … field_1 field_2 field_2 field_2 … field_2 … field_m field_m field_m … field_m iteration 0 iteration 1 iteration 2 iteration 0 iteration n-1 iteration 0 iteration 1 iteration 1 iteration 2 iteration n-1 iteration 0 iteration 1 iteration 2 iteration n-1 iteration n-1 a[i] becomes a[(i%m)*n+(i/m)]

  14. Performance Results

  15. Future Work • Integrate existing structure layout optimizations with the new structure instance interleaving work • Combine profitability heuristics of all structure layout optimizations • Extend structure instance interleaving optimization to more than one structure • Extend array remapping optimization to multi-dimensional arrays

  16. References • G. Chakrabarti and F. Chow. “Structure Layout Optimizations in the Open64 Compiler.” Proceedings of the Open64 Workshop, Boston, 2008. • M. Hagog and C. Tice. “Cache Aware Data Layout Reorganization Optimization in gcc.” Proceedings of the gcc Developers Summit, 2005. • R. Hundt, S. Mannarswamy, and D.R. Chakrabarti. “Practical Structure Layout Optimization and Advice.” Proceedings of the International Symposium on Code Generation and Optimization, New York, 2006. • D.N. Truong, F. Bodin, and A. Seznec. “Improving Cache Behavior of Dynamically Allocated Data Structures.” Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Washington D.C., 1998.

More Related