160 likes | 345 Views
Extensions to Structure Layout Optimizations in the Open64 Compiler. Michael Lai AMD. Related Work. Structure splitting, structure peeling, structure field reordering ( Hagog & Tice, Hundt , Mannarswamy & Chakrabarti ) Above implemented in the Open64 Compiler ( Chakrabarti & Chow)
E N D
Extensions to Structure Layout Optimizations in the Open64 Compiler Michael Lai AMD
Related Work • Structure splitting, structure peeling, structure field reordering (Hagog & Tice, Hundt, Mannarswamy & Chakrabarti) • Above implemented in the Open64 Compiler (Chakrabarti & Chow) • Structure instance interleaving (Truong, Bodin & Seznec) • Data splitting (Curial, Zhao & Amaral) • Array reshaping (Zhao, Cui, Gao, Silvera & Amaral)
Current Framework source source source frontend frontend frontend WHIRL WHIRL WHIRL ipl ipl ipl .o .o .o ipa_link WHIRL WHIRL
Instance Interleaving a[0].field_1 a[0].field_2 an “instance” of the structure a[0].field_3 a[1].field_1 a[1].field_2 another “instance” of the structure a[1].field_3 ...
Instance Interleaving a[0].field_1 field_1 of all the instances are a[1].field_1 interleaved together … a[0].field_2 field_2 of all the instances are a[1].field_2 interleaved together … a[0].field_3 field_3 of all the instances are a[1].field_3 interleaved together ...
Instance Interleaving array[0].field_1 array[0].field_2 array[0].field_3 … array[0].field_m array[1].field_1 array[1].field_2 array[1].field_3 … array[1].field_m … array[n-1].field_1 array[n-1].field_2 array[n-1].field_3 … array[n-1].field_m array[0].field_1 array[1].field_1 array[2].field_1 … array[n-1].field_1 array[0].field_2 array[1].field_2 array[2].field_2 … array[n-1].field_2 … array[0].field_m array[1].field_m array[2].field_m … array[n-1].field_m
Implementation • Profitability analysis (done in ipl) • During ipl compilation of each source file, access patterns of structure fields are analyzed and their usage statistics recorded • After all the functions have been compiled by ipl, the “most likely to benefit” structure (if any) is marked and passed to ipo • (By way of illustration, the ideal structure is one with many fields, each of which appearing in its own hot loop)
Implementation • Legality analysis (done in ipo) • Usual checking for address taken, escaped types, etc. • Code transformation (done in ipo) • Create internal pointers ptr_1, ptr_2, …, ptr_m to keep track of the m locations array[0].field_1, array[0].field_2, …, array[0].field_m • Rewrite array[i].field_j to ptr_j[i], if “i” is known; otherwise, incur additional overhead to compute “i”
Instance Interleaving array[0].field_1 array[0].field_2 array[0].field_3 … array[0].field_m array[1].field_1 array[1].field_2 array[1].field_3 … array[1].field_m … array[n-1].field_1 array[n-1].field_2 array[n-1].field_3 … array[n-1].field_m array[0].field_1 array[1].field_1 array[2].field_1 … array[n-1].field_1 array[0].field_2 array[1].field_2 array[2].field_2 … array[n-1].field_2 … array[0].field_m array[1].field_m array[2].field_m … array[n-1].field_m = ptr_1 = ptr_2 = ptr_m array[i].field_j becomes ptr_j[i]
Array Remapping field_1 field_2 field_3 … field_m field_1 field_2 field_3 … field_m … field_1 field_2 field_3 … field_m a[0] a[1] a[2] … a[m-1] a[m] a[m+1] a[m+2] … a[2m-1] … a[(n-1)m] a[(n-1)m+1] a[(n-1)m+2] … a[nm-1] a[0] a[1] a[2] … a[n-1] a[n] a[n+1] a[n+2] … a[2n-1] … a[(m-1)n] a[(m-1)n+1] a[(m-1)n+2] … a[mn-1] field_1 field_1 field_1 … field_1 field_2 field_2 field_2 … field_2 … field_m field_m field_m … field_m iteration 0 iteration 1 iteration 2 iteration 0 iteration n-1 iteration 0 iteration 1 iteration 1 iteration 2 iteration n-1 iteration 0 iteration 1 iteration 2 iteration n-1 iteration n-1
Implementation • Profitability analysis (done in ipl) • During ipl compilation of each source file, discover if there are arrays that behave like structures and suffer poor data cache utilization at the same time • After all the functions have been compiled by ipl, the “most likely to benefit” arrays (if any) are marked and passed to ipo • For each of these arrays, record the stride, group size, and array size associated with it
Implementation • Legality analysis (done in ipo) • Check for array aliasing, address taken, argument passing, etc. • Code transformation (done in ipo) • Construct the array remapping permutation alpha(i) = (i % m) * n + (i / m), where m is the group size and n is the number of such groups • Rewrite a[i] to a[alpha(i)]
Array Remapping field_1 field_2 field_3 … field_m field_1 field_2 field_3 … field_m … field_1 field_2 field_3 … field_m a[0] a[1] a[2] … a[m-1] a[m] a[m+1] a[m+2] … a[2m-1] … a[(n-1)m] a[(n-1)m+1] a[(n-1)m+2] … a[nm-1] a[0] a[1] a[2] … a[n-1] a[n] a[n+1] a[n+2] … a[2n-1] … a[(m-1)n] a[(m-1)n+1] a[(m-1)n+2] … a[mn-1] field_1 field_1 field_1 … field_1 field_2 field_2 field_2 … field_2 … field_m field_m field_m … field_m iteration 0 iteration 1 iteration 2 iteration 0 iteration n-1 iteration 0 iteration 1 iteration 1 iteration 2 iteration n-1 iteration 0 iteration 1 iteration 2 iteration n-1 iteration n-1 a[i] becomes a[(i%m)*n+(i/m)]
Future Work • Integrate existing structure layout optimizations with the new structure instance interleaving work • Combine profitability heuristics of all structure layout optimizations • Extend structure instance interleaving optimization to more than one structure • Extend array remapping optimization to multi-dimensional arrays
References • G. Chakrabarti and F. Chow. “Structure Layout Optimizations in the Open64 Compiler.” Proceedings of the Open64 Workshop, Boston, 2008. • M. Hagog and C. Tice. “Cache Aware Data Layout Reorganization Optimization in gcc.” Proceedings of the gcc Developers Summit, 2005. • R. Hundt, S. Mannarswamy, and D.R. Chakrabarti. “Practical Structure Layout Optimization and Advice.” Proceedings of the International Symposium on Code Generation and Optimization, New York, 2006. • D.N. Truong, F. Bodin, and A. Seznec. “Improving Cache Behavior of Dynamically Allocated Data Structures.” Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Washington D.C., 1998.