Pro64™: Performance Compilers For IA-64™

Pro64™: Performance Compilers For IA-64™ Jim Dehnert Principal Engineer 5 June 2000

Outline • IA-64™ Features • Organization and infrastructure • Components and technology • Where we are going • Opportunities for cooperation

IA-64 Features • It is all about parallelism • at the process/thread level for programmer • at the instruction level for compiler • Explicit parallel instruction semantics • Predication and Control/Data Speculation • Massive Resources (registers, memory) • Register stack and its engine • Software pipelining support • Memory hierarchy management support

Structure • Logical compilation model • Base compilation model • IPA compilation model • DSO structure

Logical Compilation Model

Base Compilation Model

IPA Compilation Model

DSO Structure

Intermediate Representation IR is called WHIRL • Common interface between components • Multiple languages and multiple targets • Same IR, 5 levels of representation • Continuous lowering as compilation progresses • Optimization strategy tied to level

Components • Front ends • Interprocedural analysis and optimization • Loop nest optimization and parallelization • Global optimization • Code generation

Front ends • C front end based on gcc • c++ front end based on g++ • Fortran90/95 front end

IPA • Two stage implementation • Local: gather local information at end of front end process • Main: analysis and optimization

IPA Main Stage Two phases in main stage Analysis: PIC symbol analysis Constant global identification Scalar mod/ref Array section Code layout for locality Optimization: Inlining Intrinsic function library inlining Cloning for constants, locality Dead function, variable elimination Constant propagation

IPA Engineering • User transparent Additional command line option (-ipa) Object files (*.o) contain WHIRL IPA in ld invokes backend • Integrated into compiler Provides information to loop nest optimizer, global optimizer, and code generator Not disabled by normal .o or DSO object Can analyze DSO objects

Loop Nest Optimizer/Parallelizer • All languages • Loop level dependence analysis • Uniprocessor loop level transformations • OpenMP • Automatic parallelization

Loop Level Transformations Based on unified cost model Heuristics integrated with software pipelining • Fission • Fusion • Unroll and jam • Loop interchange • Peeling • Tiling • Vector data prefetching

Parallelization • Automatic Array privatization Doacross parallelization Array section analysis • Directive based OpenMP Integrated with automatic methods

Global optimization • Static Single Assignment is unifying technology • Industrial strength SSA • All traditional optimizations implemented • SSA-preserving transformations • Deals with aliasing and calls • Uniformly handles indirect loads/stores • Benefits over bit vector techniques • More efficient: setup and use • More natural algorithms => robustness • Allows selective transformation

Code Generation • Inner loops IF conversion Software pipelining Recurrence breaking Predication and rotating registers • Elsewhere Hyperblock formation Frequency based block reordering Global code motion Peephole optimization

Technology • Target description tables (targ_info) • Feedback • Parallelization • Static Single Assignment • Software pipelining • Global code motion

Target description tables Isolate machine attributes from compiler code • Resources: functional units, busses • Literals: sizes, ranges, excluded bits • Registers: classes, supported types • Instructions: opcodes, operands, attributes, scheduling, assembly, object code • Scheduling: resources, latencies

Feedback Used throughout the compiler • Instrumentation can be added at any stage • Explicit instrumentation data incorporated where inserted • Instrumentation data maintained and checked for consistency through program transformations

SSA Advantages • Built-in use-def edges • Sparse representation of data flow information • Sparse data flow propagation based on SSA graph • Linear or near-linear algorithms • Every optimization is global • Transform one construct at a time, customize to context • Handle second order effects

SSA as IR for optimizer • SSA constructed only once at set-up time • Use-def info inherently part of SSA • Use only optimization algorithms that preserve SSA form: • Transformations do not invalidate SSA info • Full set of SSA-preserving algorithms • No SSA construction overhead between phases: • Can arbitrarily repeat a phase for newly exposed optimization opportunities • Extended to uniformly handle indirect memory references

Software Pipelining • Technology evolved from Cydra compilers • Powerful preliminary loop processing • Effective minimization of loop overhead code • Highly efficient backtracking for scheduling • Integrated register allocation, interface with CG • Integrated with LNO loop nest transformations

Global Code Motion • Moves instructions between basic blocks • Purpose: balance resources, improve critical paths • Uses program structure to guide motion • Uses feedback or estimated frequency to prioritize motion • No artificial barriers, no exclusively-optimized paths

Where are we going? • Open source compiler suite • Target description for IA-64 • Available via usual Linux distributions and www.oss.sgi.com • Beta version in June • MR version when Intel ships systems • OpenMP for c/c++ (later) • OpenMP extensions for NUMA (later)

Areas for collaboration • Target descriptions for other ISAs • real or prototype • Additional optimizations • Generate information for performance analysis tools • Extensions to OpenMP • Surprise me

 The solution is in sight.

Pro64™: Performance Compilers For IA-64™

Pro64™: Performance Compilers For IA-64™

Presentation Transcript

Why STMs Need Compilers (a war story)

Languages and Compilers (SProg og Oversættere)

Languages and Compilers (SProg og Oversættere)

Beyond Auto-Parallelization: Compilers for Many-Core Systems

Interpreters and Compilers

Parallel Sessions: Compilers

Compilers and Application Security

Software Optimization

CSE P501 – Compiler Construction

Compilers

Programming Languages and Compilers (CS 421)

Chapter 2 Language Processors

Programming Languages and Compilers (CS 421)

G-RSM CVS System at ECPC

Compilers for DSP Processors and Low-Power

Assemblers and Compilers

Software Tools Using PBS

Programming Languages and Compilers (CS 421)

Programming Languages and Compilers (CS 421)

What are Compilers?

Programming Languages and Compilers (CS 421)

COMPILERS