1 / 19

Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements

Explore the design, implementation, and results of structure layout optimizations in the Open64 Compiler. Learn about criteria, IPA analysis, implementation details, and future work for cache-friendly applications.

hendersond
Download Presentation

Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC.

  2. Outline • Motivation • Types of structure layout optimizations • Criteria for structure layout optimizations • Implementation details • Performance results • Future work • Conclusion Open64 Workshop 2008

  3. Motivation • Poor data locality in many applications • High data cache miss rates • Growing gap between processor and memory speeds Our Aim • Make applications more cache-friendly Our Approach • Change layout of data structures • Requires whole-program optimization • Use Inter-Procedural Analysis and Optimizations (IPA) Open64 Workshop 2008

  4. IPA • Summarization • Analysis • Optimization Open64 Workshop 2008

  5. Types of Structure Layout Optimizations • Structure splitting • Structure peeling struct struct_A { double d1; double d2; int i; float f; long long l; char c; struct struct_A * next; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; }; Open64 Workshop 2008

  6. Structure Splitting Example struct new_struct_A { double d1; int i; long long l; struct new_struct_A * next; struct cold_sub_struct_A * p; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; struct struct_A * next; }; struct cold_sub_struct_A { double d2; float f; char c; }; Open64 Workshop 2008

  7. Structure Peeling Example struct new_struct_A { double d1; int i; long long l; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; }; structcold_sub_struct_A { double d2; float f; char c; }; Open64 Workshop 2008

  8. Criteria for structure layout optimizations • Legality Analysis • Type cast • Address of a field is taken • Escaped types • Parameter types • Full visibility to IPA • Alignment restrictions • Profitability Analysis • Hotness • Affinity • Field accesses at loop level • Size Open64 Workshop 2008

  9. Implementation Details Step 1: Type information summarization (IPL) Step 2: Symbol table merging (IPA) Step 3: Legality and profitability analysis (IPA analysis) Step 4: Transforming the program (IPA optimization) Open64 Workshop 2008

  10. Implementation Details: Type information summarization • Information summarization in IPL • Framework for computing static profiles using heuristics • New TY flag TY_NO_SPLIT • SUMMARY_TY_INFO • SUMMARY_LOOP • For each DO_LOOP, WHILE_DO, DO_WHILE • Bit-vector to track field accesses of up to N structure for each loop • Considers field accesses immediately inside loop • These fields are considered affine to each other • Execution count of statements immediately inside loop • From statically estimated profiles or from runtime feedback Open64 Workshop 2008

  11. Implementation Details: IPA Analysis • Inter-procedurally update statically estimated execution count of PUs • Update statically estimated loop frequencies in SUMMARY_LOOP • Consider SUMMARY_LOOP from the hottest P PUs • Determine candidates for structure-layout transformation • Determine new layout of structures Open64 Workshop 2008

  12. Implementation Details: IPA Analysis Example Li — Loops Fj — Fields in a struct AGk — Affinity groups Open64 Workshop 2008

  13. Implementation Details: Transforming the program • New type definitions • Field table update • Field access statements • New symbols • Assignment statements Example: struct S struct T { { // N fields // AG1 fields struct T * p; // AG2 fields // M fields }; }; // peel T struct S { // N fields struct T1 * p1; struct T2 * p2; // M fields }; struct T1 struct T2 { { // AG1 fields // AG2 fields }; }; Open64 Workshop 2008

  14. Implementation Details: Transforming the program (continued) Function calls to memory management routines Example: p = (T *) malloc (N * sizeof (T)) if (p == NULL) exit (1); • Detect memory management routine calls involving transformed type T • Replicate call, assignment statements • Update size of memory being allocated • Handle comparisons involving pointer p Open64 Workshop 2008

  15. Performance Results Compilations options: -Ofast at 32-bit ABI Speedup due to structure layout optimizations Open64 Workshop 2008

  16. Performance Results (continued) Compilations options: -Ofast at 64-bit ABI Speedup due to structure layout optimizations Open64 Workshop 2008

  17. Performance Results (continued) Compilations options: -Ofast at 64-bit ABI Multiple copies of 462.libquantum running on multi-core chip Platform: Quad-core AMD Barcelona (2.0 GHz, 8GB, 512KB, 2MB) 3rd level cache shared among 4 cores Speedup from structure layout optimizations Open64 Workshop 2008

  18. Future Work • Tune static profile estimation • Less restrictions • Integrate with field-reordering Open64 Workshop 2008

  19. Conclusion • A framework for performing structure layout transformations is now available in the Open64 compiler. • The superior infrastructure in the Open64 compiler helped us implement the optimizations cleanly and with relatively less effort. • Substantial speedups are possible on some of the CPU2000 and CPU2006 SPEC benchmarks. • Structure layout optimization is a required feature for a compiler to remain competitive. Open64 Workshop 2008

More Related