Extensions to Structure Layout Optimizations in the Open64 Compiler

Extensions to Structure Layout Optimizations in the Open64 Compiler Michael Lai AMD

Related Work • Structure splitting, structure peeling, structure field reordering (Hagog & Tice, Hundt, Mannarswamy & Chakrabarti) • Above implemented in the Open64 Compiler (Chakrabarti & Chow) • Structure instance interleaving (Truong, Bodin & Seznec) • Data splitting (Curial, Zhao & Amaral) • Array reshaping (Zhao, Cui, Gao, Silvera & Amaral)

Current Framework source source source frontend frontend frontend WHIRL WHIRL WHIRL ipl ipl ipl .o .o .o ipa_link WHIRL WHIRL

Instance Interleaving a[0].field_1 a[0].field_2 an “instance” of the structure a[0].field_3 a[1].field_1 a[1].field_2 another “instance” of the structure a[1].field_3 ...

Instance Interleaving a[0].field_1 field_1 of all the instances are a[1].field_1 interleaved together … a[0].field_2 field_2 of all the instances are a[1].field_2 interleaved together … a[0].field_3 field_3 of all the instances are a[1].field_3 interleaved together ...

Instance Interleaving array[0].field_1 array[0].field_2 array[0].field_3 … array[0].field_m array[1].field_1 array[1].field_2 array[1].field_3 … array[1].field_m … array[n-1].field_1 array[n-1].field_2 array[n-1].field_3 … array[n-1].field_m array[0].field_1 array[1].field_1 array[2].field_1 … array[n-1].field_1 array[0].field_2 array[1].field_2 array[2].field_2 … array[n-1].field_2 … array[0].field_m array[1].field_m array[2].field_m … array[n-1].field_m

Implementation • Profitability analysis (done in ipl) • During ipl compilation of each source file, access patterns of structure fields are analyzed and their usage statistics recorded • After all the functions have been compiled by ipl, the “most likely to benefit” structure (if any) is marked and passed to ipo • (By way of illustration, the ideal structure is one with many fields, each of which appearing in its own hot loop)

Implementation • Legality analysis (done in ipo) • Usual checking for address taken, escaped types, etc. • Code transformation (done in ipo) • Create internal pointers ptr_1, ptr_2, …, ptr_m to keep track of the m locations array[0].field_1, array[0].field_2, …, array[0].field_m • Rewrite array[i].field_j to ptr_j[i], if “i” is known; otherwise, incur additional overhead to compute “i”

Instance Interleaving array[0].field_1 array[0].field_2 array[0].field_3 … array[0].field_m array[1].field_1 array[1].field_2 array[1].field_3 … array[1].field_m … array[n-1].field_1 array[n-1].field_2 array[n-1].field_3 … array[n-1].field_m array[0].field_1 array[1].field_1 array[2].field_1 … array[n-1].field_1 array[0].field_2 array[1].field_2 array[2].field_2 … array[n-1].field_2 … array[0].field_m array[1].field_m array[2].field_m … array[n-1].field_m = ptr_1 = ptr_2 = ptr_m array[i].field_j becomes ptr_j[i]

Array Remapping field_1 field_2 field_3 … field_m field_1 field_2 field_3 … field_m … field_1 field_2 field_3 … field_m a[0] a[1] a[2] … a[m-1] a[m] a[m+1] a[m+2] … a[2m-1] … a[(n-1)m] a[(n-1)m+1] a[(n-1)m+2] … a[nm-1] a[0] a[1] a[2] … a[n-1] a[n] a[n+1] a[n+2] … a[2n-1] … a[(m-1)n] a[(m-1)n+1] a[(m-1)n+2] … a[mn-1] field_1 field_1 field_1 … field_1 field_2 field_2 field_2 … field_2 … field_m field_m field_m … field_m iteration 0 iteration 1 iteration 2 iteration 0 iteration n-1 iteration 0 iteration 1 iteration 1 iteration 2 iteration n-1 iteration 0 iteration 1 iteration 2 iteration n-1 iteration n-1

Implementation • Profitability analysis (done in ipl) • During ipl compilation of each source file, discover if there are arrays that behave like structures and suffer poor data cache utilization at the same time • After all the functions have been compiled by ipl, the “most likely to benefit” arrays (if any) are marked and passed to ipo • For each of these arrays, record the stride, group size, and array size associated with it

Implementation • Legality analysis (done in ipo) • Check for array aliasing, address taken, argument passing, etc. • Code transformation (done in ipo) • Construct the array remapping permutation alpha(i) = (i % m) * n + (i / m), where m is the group size and n is the number of such groups • Rewrite a[i] to a[alpha(i)]

Array Remapping field_1 field_2 field_3 … field_m field_1 field_2 field_3 … field_m … field_1 field_2 field_3 … field_m a[0] a[1] a[2] … a[m-1] a[m] a[m+1] a[m+2] … a[2m-1] … a[(n-1)m] a[(n-1)m+1] a[(n-1)m+2] … a[nm-1] a[0] a[1] a[2] … a[n-1] a[n] a[n+1] a[n+2] … a[2n-1] … a[(m-1)n] a[(m-1)n+1] a[(m-1)n+2] … a[mn-1] field_1 field_1 field_1 … field_1 field_2 field_2 field_2 … field_2 … field_m field_m field_m … field_m iteration 0 iteration 1 iteration 2 iteration 0 iteration n-1 iteration 0 iteration 1 iteration 1 iteration 2 iteration n-1 iteration 0 iteration 1 iteration 2 iteration n-1 iteration n-1 a[i] becomes a[(i%m)*n+(i/m)]

Performance Results

Future Work • Integrate existing structure layout optimizations with the new structure instance interleaving work • Combine profitability heuristics of all structure layout optimizations • Extend structure instance interleaving optimization to more than one structure • Extend array remapping optimization to multi-dimensional arrays

References • G. Chakrabarti and F. Chow. “Structure Layout Optimizations in the Open64 Compiler.” Proceedings of the Open64 Workshop, Boston, 2008. • M. Hagog and C. Tice. “Cache Aware Data Layout Reorganization Optimization in gcc.” Proceedings of the gcc Developers Summit, 2005. • R. Hundt, S. Mannarswamy, and D.R. Chakrabarti. “Practical Structure Layout Optimization and Advice.” Proceedings of the International Symposium on Code Generation and Optimization, New York, 2006. • D.N. Truong, F. Bodin, and A. Seznec. “Improving Cache Behavior of Dynamically Allocated Data Structures.” Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Washington D.C., 1998.

Extensions to Structure Layout Optimizations in the Open64 Compiler

Extensions to Structure Layout Optimizations in the Open64 Compiler

Presentation Transcript

An Introduction to Open64 Compiler

A Context-Sensitive Pointer Analysis Phase in Open64 Compiler

Generating Compiler Optimizations from Proofs

Automatically Proving the Correctness of Compiler Optimizations

Weakest Precondition Synthesis for Compiler Optimizations

Optimizing Compiler . Scalar optimizations .

Open64: A Framework for High performance Compiler

Reducing Misses using Compiler Optimizations

Compiler Optimizations in the Berkeley UPC Translator

Optimizing compiler . Interpocedural optimizations .

Open64 | The Open Research Compiler

Compiler Speculative Optimizations

Performance Analysis and Compiler Optimizations

Compiler-Directed instruction cache leakage optimizations

Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements

Compiler Optimizations

CSC D70: Compiler Optimization Memory Optimizations

Optimizing Compiler . Scalar optimizations .

Open64 | The Open Research Compiler