290 likes | 300 Views
Pro64™: Performance Compilers For IA-64™. Jim Dehnert Principal Engineer 5 June 2000. Outline. IA-64™ Features Organization and infrastructure Components and technology Where we are going Opportunities for cooperation. IA-64 Features. It is all about parallelism
E N D
Pro64™: Performance Compilers For IA-64™ Jim Dehnert Principal Engineer 5 June 2000
Outline • IA-64™ Features • Organization and infrastructure • Components and technology • Where we are going • Opportunities for cooperation
IA-64 Features • It is all about parallelism • at the process/thread level for programmer • at the instruction level for compiler • Explicit parallel instruction semantics • Predication and Control/Data Speculation • Massive Resources (registers, memory) • Register stack and its engine • Software pipelining support • Memory hierarchy management support
Structure • Logical compilation model • Base compilation model • IPA compilation model • DSO structure
Intermediate Representation IR is called WHIRL • Common interface between components • Multiple languages and multiple targets • Same IR, 5 levels of representation • Continuous lowering as compilation progresses • Optimization strategy tied to level
Components • Front ends • Interprocedural analysis and optimization • Loop nest optimization and parallelization • Global optimization • Code generation
Front ends • C front end based on gcc • c++ front end based on g++ • Fortran90/95 front end
IPA • Two stage implementation • Local: gather local information at end of front end process • Main: analysis and optimization
IPA Main Stage Two phases in main stage Analysis: PIC symbol analysis Constant global identification Scalar mod/ref Array section Code layout for locality Optimization: Inlining Intrinsic function library inlining Cloning for constants, locality Dead function, variable elimination Constant propagation
IPA Engineering • User transparent Additional command line option (-ipa) Object files (*.o) contain WHIRL IPA in ld invokes backend • Integrated into compiler Provides information to loop nest optimizer, global optimizer, and code generator Not disabled by normal .o or DSO object Can analyze DSO objects
Loop Nest Optimizer/Parallelizer • All languages • Loop level dependence analysis • Uniprocessor loop level transformations • OpenMP • Automatic parallelization
Loop Level Transformations Based on unified cost model Heuristics integrated with software pipelining • Fission • Fusion • Unroll and jam • Loop interchange • Peeling • Tiling • Vector data prefetching
Parallelization • Automatic Array privatization Doacross parallelization Array section analysis • Directive based OpenMP Integrated with automatic methods
Global optimization • Static Single Assignment is unifying technology • Industrial strength SSA • All traditional optimizations implemented • SSA-preserving transformations • Deals with aliasing and calls • Uniformly handles indirect loads/stores • Benefits over bit vector techniques • More efficient: setup and use • More natural algorithms => robustness • Allows selective transformation
Code Generation • Inner loops IF conversion Software pipelining Recurrence breaking Predication and rotating registers • Elsewhere Hyperblock formation Frequency based block reordering Global code motion Peephole optimization
Technology • Target description tables (targ_info) • Feedback • Parallelization • Static Single Assignment • Software pipelining • Global code motion
Target description tables Isolate machine attributes from compiler code • Resources: functional units, busses • Literals: sizes, ranges, excluded bits • Registers: classes, supported types • Instructions: opcodes, operands, attributes, scheduling, assembly, object code • Scheduling: resources, latencies
Feedback Used throughout the compiler • Instrumentation can be added at any stage • Explicit instrumentation data incorporated where inserted • Instrumentation data maintained and checked for consistency through program transformations
SSA Advantages • Built-in use-def edges • Sparse representation of data flow information • Sparse data flow propagation based on SSA graph • Linear or near-linear algorithms • Every optimization is global • Transform one construct at a time, customize to context • Handle second order effects
SSA as IR for optimizer • SSA constructed only once at set-up time • Use-def info inherently part of SSA • Use only optimization algorithms that preserve SSA form: • Transformations do not invalidate SSA info • Full set of SSA-preserving algorithms • No SSA construction overhead between phases: • Can arbitrarily repeat a phase for newly exposed optimization opportunities • Extended to uniformly handle indirect memory references
Software Pipelining • Technology evolved from Cydra compilers • Powerful preliminary loop processing • Effective minimization of loop overhead code • Highly efficient backtracking for scheduling • Integrated register allocation, interface with CG • Integrated with LNO loop nest transformations
Global Code Motion • Moves instructions between basic blocks • Purpose: balance resources, improve critical paths • Uses program structure to guide motion • Uses feedback or estimated frequency to prioritize motion • No artificial barriers, no exclusively-optimized paths
Where are we going? • Open source compiler suite • Target description for IA-64 • Available via usual Linux distributions and www.oss.sgi.com • Beta version in June • MR version when Intel ships systems • OpenMP for c/c++ (later) • OpenMP extensions for NUMA (later)
Areas for collaboration • Target descriptions for other ISAs • real or prototype • Additional optimizations • Generate information for performance analysis tools • Extensions to OpenMP • Surprise me
The solution is in sight.