190 likes | 216 Views
Explore the design, implementation, and results of structure layout optimizations in the Open64 Compiler. Learn about criteria, IPA analysis, implementation details, and future work for cache-friendly applications.
E N D
Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC.
Outline • Motivation • Types of structure layout optimizations • Criteria for structure layout optimizations • Implementation details • Performance results • Future work • Conclusion Open64 Workshop 2008
Motivation • Poor data locality in many applications • High data cache miss rates • Growing gap between processor and memory speeds Our Aim • Make applications more cache-friendly Our Approach • Change layout of data structures • Requires whole-program optimization • Use Inter-Procedural Analysis and Optimizations (IPA) Open64 Workshop 2008
IPA • Summarization • Analysis • Optimization Open64 Workshop 2008
Types of Structure Layout Optimizations • Structure splitting • Structure peeling struct struct_A { double d1; double d2; int i; float f; long long l; char c; struct struct_A * next; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; }; Open64 Workshop 2008
Structure Splitting Example struct new_struct_A { double d1; int i; long long l; struct new_struct_A * next; struct cold_sub_struct_A * p; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; struct struct_A * next; }; struct cold_sub_struct_A { double d2; float f; char c; }; Open64 Workshop 2008
Structure Peeling Example struct new_struct_A { double d1; int i; long long l; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; }; structcold_sub_struct_A { double d2; float f; char c; }; Open64 Workshop 2008
Criteria for structure layout optimizations • Legality Analysis • Type cast • Address of a field is taken • Escaped types • Parameter types • Full visibility to IPA • Alignment restrictions • Profitability Analysis • Hotness • Affinity • Field accesses at loop level • Size Open64 Workshop 2008
Implementation Details Step 1: Type information summarization (IPL) Step 2: Symbol table merging (IPA) Step 3: Legality and profitability analysis (IPA analysis) Step 4: Transforming the program (IPA optimization) Open64 Workshop 2008
Implementation Details: Type information summarization • Information summarization in IPL • Framework for computing static profiles using heuristics • New TY flag TY_NO_SPLIT • SUMMARY_TY_INFO • SUMMARY_LOOP • For each DO_LOOP, WHILE_DO, DO_WHILE • Bit-vector to track field accesses of up to N structure for each loop • Considers field accesses immediately inside loop • These fields are considered affine to each other • Execution count of statements immediately inside loop • From statically estimated profiles or from runtime feedback Open64 Workshop 2008
Implementation Details: IPA Analysis • Inter-procedurally update statically estimated execution count of PUs • Update statically estimated loop frequencies in SUMMARY_LOOP • Consider SUMMARY_LOOP from the hottest P PUs • Determine candidates for structure-layout transformation • Determine new layout of structures Open64 Workshop 2008
Implementation Details: IPA Analysis Example Li — Loops Fj — Fields in a struct AGk — Affinity groups Open64 Workshop 2008
Implementation Details: Transforming the program • New type definitions • Field table update • Field access statements • New symbols • Assignment statements Example: struct S struct T { { // N fields // AG1 fields struct T * p; // AG2 fields // M fields }; }; // peel T struct S { // N fields struct T1 * p1; struct T2 * p2; // M fields }; struct T1 struct T2 { { // AG1 fields // AG2 fields }; }; Open64 Workshop 2008
Implementation Details: Transforming the program (continued) Function calls to memory management routines Example: p = (T *) malloc (N * sizeof (T)) if (p == NULL) exit (1); • Detect memory management routine calls involving transformed type T • Replicate call, assignment statements • Update size of memory being allocated • Handle comparisons involving pointer p Open64 Workshop 2008
Performance Results Compilations options: -Ofast at 32-bit ABI Speedup due to structure layout optimizations Open64 Workshop 2008
Performance Results (continued) Compilations options: -Ofast at 64-bit ABI Speedup due to structure layout optimizations Open64 Workshop 2008
Performance Results (continued) Compilations options: -Ofast at 64-bit ABI Multiple copies of 462.libquantum running on multi-core chip Platform: Quad-core AMD Barcelona (2.0 GHz, 8GB, 512KB, 2MB) 3rd level cache shared among 4 cores Speedup from structure layout optimizations Open64 Workshop 2008
Future Work • Tune static profile estimation • Less restrictions • Integrate with field-reordering Open64 Workshop 2008
Conclusion • A framework for performing structure layout transformations is now available in the Open64 compiler. • The superior infrastructure in the Open64 compiler helped us implement the optimizations cleanly and with relatively less effort. • Substantial speedups are possible on some of the CPU2000 and CPU2006 SPEC benchmarks. • Structure layout optimization is a required feature for a compiler to remain competitive. Open64 Workshop 2008