1 / 29

Pro64™: Performance Compilers For IA-64™

Pro64™: Performance Compilers For IA-64™. Jim Dehnert Principal Engineer 5 June 2000. Outline. IA-64™ Features Organization and infrastructure Components and technology Where we are going Opportunities for cooperation. IA-64 Features. It is all about parallelism

npleasants
Download Presentation

Pro64™: Performance Compilers For IA-64™

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pro64™: Performance Compilers For IA-64™ Jim Dehnert Principal Engineer 5 June 2000

  2. Outline • IA-64™ Features • Organization and infrastructure • Components and technology • Where we are going • Opportunities for cooperation

  3. IA-64 Features • It is all about parallelism • at the process/thread level for programmer • at the instruction level for compiler • Explicit parallel instruction semantics • Predication and Control/Data Speculation • Massive Resources (registers, memory) • Register stack and its engine • Software pipelining support • Memory hierarchy management support

  4. Structure • Logical compilation model • Base compilation model • IPA compilation model • DSO structure

  5. Logical Compilation Model

  6. Base Compilation Model

  7. IPA Compilation Model

  8. DSO Structure

  9. Intermediate Representation IR is called WHIRL • Common interface between components • Multiple languages and multiple targets • Same IR, 5 levels of representation • Continuous lowering as compilation progresses • Optimization strategy tied to level

  10. Components • Front ends • Interprocedural analysis and optimization • Loop nest optimization and parallelization • Global optimization • Code generation

  11. Front ends • C front end based on gcc • c++ front end based on g++ • Fortran90/95 front end

  12. IPA • Two stage implementation • Local: gather local information at end of front end process • Main: analysis and optimization

  13. IPA Main Stage Two phases in main stage Analysis: PIC symbol analysis Constant global identification Scalar mod/ref Array section Code layout for locality Optimization: Inlining Intrinsic function library inlining Cloning for constants, locality Dead function, variable elimination Constant propagation

  14. IPA Engineering • User transparent Additional command line option (-ipa) Object files (*.o) contain WHIRL IPA in ld invokes backend • Integrated into compiler Provides information to loop nest optimizer, global optimizer, and code generator Not disabled by normal .o or DSO object Can analyze DSO objects

  15. Loop Nest Optimizer/Parallelizer • All languages • Loop level dependence analysis • Uniprocessor loop level transformations • OpenMP • Automatic parallelization

  16. Loop Level Transformations Based on unified cost model Heuristics integrated with software pipelining • Fission • Fusion • Unroll and jam • Loop interchange • Peeling • Tiling • Vector data prefetching

  17. Parallelization • Automatic Array privatization Doacross parallelization Array section analysis • Directive based OpenMP Integrated with automatic methods

  18. Global optimization • Static Single Assignment is unifying technology • Industrial strength SSA • All traditional optimizations implemented • SSA-preserving transformations • Deals with aliasing and calls • Uniformly handles indirect loads/stores • Benefits over bit vector techniques • More efficient: setup and use • More natural algorithms => robustness • Allows selective transformation

  19. Code Generation • Inner loops IF conversion Software pipelining Recurrence breaking Predication and rotating registers • Elsewhere Hyperblock formation Frequency based block reordering Global code motion Peephole optimization

  20. Technology • Target description tables (targ_info) • Feedback • Parallelization • Static Single Assignment • Software pipelining • Global code motion

  21. Target description tables Isolate machine attributes from compiler code • Resources: functional units, busses • Literals: sizes, ranges, excluded bits • Registers: classes, supported types • Instructions: opcodes, operands, attributes, scheduling, assembly, object code • Scheduling: resources, latencies

  22. Feedback Used throughout the compiler • Instrumentation can be added at any stage • Explicit instrumentation data incorporated where inserted • Instrumentation data maintained and checked for consistency through program transformations

  23. SSA Advantages • Built-in use-def edges • Sparse representation of data flow information • Sparse data flow propagation based on SSA graph • Linear or near-linear algorithms • Every optimization is global • Transform one construct at a time, customize to context • Handle second order effects

  24. SSA as IR for optimizer • SSA constructed only once at set-up time • Use-def info inherently part of SSA • Use only optimization algorithms that preserve SSA form: • Transformations do not invalidate SSA info • Full set of SSA-preserving algorithms • No SSA construction overhead between phases: • Can arbitrarily repeat a phase for newly exposed optimization opportunities • Extended to uniformly handle indirect memory references

  25. Software Pipelining • Technology evolved from Cydra compilers • Powerful preliminary loop processing • Effective minimization of loop overhead code • Highly efficient backtracking for scheduling • Integrated register allocation, interface with CG • Integrated with LNO loop nest transformations

  26. Global Code Motion • Moves instructions between basic blocks • Purpose: balance resources, improve critical paths • Uses program structure to guide motion • Uses feedback or estimated frequency to prioritize motion • No artificial barriers, no exclusively-optimized paths

  27. Where are we going? • Open source compiler suite • Target description for IA-64 • Available via usual Linux distributions and www.oss.sgi.com • Beta version in June • MR version when Intel ships systems • OpenMP for c/c++ (later) • OpenMP extensions for NUMA (later)

  28. Areas for collaboration • Target descriptions for other ISAs • real or prototype • Additional optimizations • Generate information for performance analysis tools • Extensions to OpenMP • Surprise me

  29. The solution is in sight.

More Related