340 likes | 814 Views
An Introduction to Open64 Compiler. Guang R. Gao (Capsl, Udel) Xiaomi An (Capsl, Udel) Curtsey to Fred Chow. Outline. Background and Motivation Part I: An overview of the Open64 compiler infrastructure and design principles Part II: Using Open64 in compiler research & development.
E N D
An Introduction to Open64 Compiler Guang R. Gao (Capsl, Udel) Xiaomi An (Capsl, Udel) Curtsey to Fred Chow
Outline • Background and Motivation • Part I: An overview of the Open64 compiler infrastructure and design principles • Part II: Using Open64 in compiler research & development Open64 Tutorial - An Introduction
What is The Original Open64 (Pro64) ? • A suite of optimizing compiler tools for Intel IA-64, x86-64 on Linux systems • C, C++ and Fortran90/95 compilers • Conforming to the IA-64, x86-64 Linux ABI and API standards • Open to all researchers/developers in the community Open64 Tutorial - An Introduction
Stanford RISC compiler research Cydrome Cydra5 Compiler MIPS Ucode Compiler (R2000) SGI Ragnarok Compiler (R8000) MIPS Ucode Compiler (R4000) SGI MIPSpro Compiler (R10000) Pro64/Open64 Compiler (Itanium) Historical Perspectives Software pipelining 1980-83 1989 Global opt under -O2 Floating-pt performance 1987 1994 Loop opt under -O3 Stanford SUIF Rice IPA 1991 1997 Curtsey to Fred Chow 2000
Who Might Want to Use Open64? • Researchers: test new compiler analysis and optimization algorithms • Developers : retarget to another architecture/system • Educators: a compiler teaching platform Open64 Tutorial - An Introduction
Who Are Using Open64 – from ACM CGO 2008 Open64 Workshop Attendee List (April 6, 2008, Boston) 12 Companies: Google, nVidia, IBM, HP, Qualcomm, AMD, Tilera, PathScale, SimpLight, Absoft, Coherenet Logix, STMicro, 9 Education Institutes: USC, Rice, U. of Delaware, U. Houston, U. of Illinois, Tsinghua, Fudan, ENS Lyon, NRCC,
Vision and Status of Open64 Today ? • People should view it as GCC with an alternative backend with great potential to reclaim the best compiler in the world • The technology incorporated all top compiler optimization research in 90's • It has regain momentum in the last three years due to Pathscale and HP's investment in robustness and performance • Targeted to x86, Itanium in the public repository, ARM, MIPS, PowerPC, and several other signal processing CPU in private branches Open64 Tutorial - An Introduction
Overview of Open64 Infrastructure • Logical compilation model and component flow • WHIRL Intermediate Representation • Very High Optimizer • Inter-Procedural Analysis (IPA) • Loop Nest Optimizer (LNO) and Parallelization • Global optimization (WOPT) • Code Generation (CG) Open64 Tutorial - An Introduction
Front end Very High Optimizer Interprocedural Analysis and Optimization Good IR Loop Nest Optimization and Parallelization Global (Scalar) Optimization Middle-End Backend Code Generation Open64 Tutorial - An Introduction
Front Ends • C front end based on gcc • C++ front end based on g++ • Fortran90/95 front end from MIPSpro Open64 Tutorial - An Introduction
Semantic Level of IR At higher level: • More kinds of constructs • Shorter code sequence • More program info present • Hierarchical constructs • Cannot perform many optimizations High Source program At lower level: • Less program info • Fewer kinds of constructs • Longer code sequence • Flat constructs • All optimizations can be performed Low Machine instruction Open64 Tutorial - An Introduction
Compilation Flow Open64 Tutorial - An Introduction
Very High WHIRL Optimizer Lower to High WHIRL while performing optimizations First part deals with common language constructs • Bit-field optimizations • Short-circuit boolean expressions • Switch statement optimization • Simple if-conversion • Assignments of small structs: lower struct copy to assignments of individual fields • Convert patterns of code sequences to intrinsics: • Saturated subtract, abs() • Other pattern-based optimizations • max, min Open64 Tutorial - An Introduction
Roles of IPA The only optimization component operating at program scope • Analysis: collect information from entire program • Optimization: performs optimizations across procedure boundaries • Depends on later phases for full optimization effects • Supplies cross-file information for later optimization phases Open64 Tutorial - An Introduction
IPA Flow Open64 Tutorial - An Introduction
IPA Main Stage • Analysis • alias analysis • array section • code layout • Optimization • inlining • cloning • dead function and variable elimination • constant propagation Open64 Tutorial - An Introduction
Loop Nest Optimizations Transformations for Data Cache Transformations that help other optimizations Vectorization and Parallellization
LNO Transformations for Data Cache • Cache blocking • Transform loop to work on sub-matrices that fit in cache • Loop interchange Array Padding • Reduce cache conflicts Prefetches generation • Hide the long latency of cache miss references Loop fusion Loop fission Open64 Tutorial - An Introduction
LNO Transformations that Help Other Optimizations Scalar Expansion / Array Expansion • Reduce inter-loop dependencies, enable parallelization Scalar Variable Renaming • Less constraints for register allocation Array Scalarization • Improves register allocation Hoist Messy Loop Bounds Outer loop unrolling Array Substitution (Forward and Backward) Loop Unswitching Hoist IF Inter-iteration CSE Open64 Tutorial - An Introduction
LNO Parallelization • SIMD code generation • Highly dependent on the SIMD instructions in target • Generate vector intrinsics • Based on the library functions available • Automatic parallelization • Leverage OpenMP support in rest of backend Open64 Tutorial - An Introduction
Global Optimization Phase • SSA is unifying technology • Open-64 extension to SSA technology • Representing aliases and indirect memory operations (Chow et al, CC 96) • Integrated partial redundancy elimination (Chow et al, PLDI 97; Kennedy et al, CC 98, TOPLAS 99) • Support for speculative code motion • Register promotion via load and store placement (Lo et al, PLDI 98) Open64 Tutorial - An Introduction
Overview • Works at function scope • Builds control flow graph • Performs alias analysis • Represents program in SSA form • SSA-based optimization algorithms • Co-operations among multiple phases to achieve final effects • Phase order designed to maximize effectiveness • Separated into Preopt and Mainopt • Pre-opt serves as pre-optimizing front-ends for LNO and IPA (in High WHIRL) • Provide use-def info to LNO and IPA • Provide alias info to CG Open64 Tutorial - An Introduction
Optimizations Performed Pre-optimizer • Goto conversion • Loop normalization • Induction variable canonicalization • Dead store elimination • Copy propagation • Dead code elimination • Alias analysis (flow-free and flow-sensitive) • Compute def-use chains for LNO and IPA • Pass alias info to CG Main optimizer • Partial redundancy elimination based on SSAPRE framework • Global common subexpression • Loop invariant code motion • Strength reduction • Linear function test replacement • Value-number-based full redundancy elimination • Induction variable elimination • Register promotion • Bitwise dead store elimination Open64 Tutorial - An Introduction
Feedback • Used throughout the compiler • Instrumentation can be added at any stage • VHO, LNO, WOPT, CG • Explicit instrumentation data incorporated where inserted • Instrumentation data maintained and checked for consistency through program transformations. Open64 Tutorial - An Introduction
Smooth Info flow into Backend in Pro64 WHIRL WHIRL-to-TOP CGIR Information from Front end (alias, structure, etc.) Hyperblock Formation Critical Path Reduction Extended basic block optimization Inner Loop Opt Software Pipelining Control Flow optimization IGLS GRA/LRA Code Emission Executable \course\cpeg421-10s\Topic2a.ppt
Software Pipelining vsNormal Scheduling a SWP-amenable loop candidate ? No Yes IGLS Inner loop processing software pipelining GRA/LRA Failure/not profitable IGLS Code Emission Success \course\cpeg421-10s\Topic2a.ppt
Code Generation Intermediate Representation (CGIR) • TOPs (Target Operations) are “quads” • Operands/results are TNs • Basic block nodes in control flow graph • Load/store architecture • Supports predication • Flags on TOPs (copy ops, integer add, load, etc.) • Flags on operands (TNs) Open64 Tutorial - An Introduction
From WHIRL to CGIR Cont’d • Information passed • alias information • loop information • symbol table and maps Open64 Tutorial - An Introduction
The Target Information Table (TARG_INFO) • Objective: • Parameterized description of a target machine and system architecture • Separates architecture details from the compiler’s algorithms • Minimizes compiler changes when targeting a new architecture Open64 Tutorial - An Introduction
WHIRL SSA: A New Optimization Infrastructure for Open64 Parallel Processing Institute, Fudan University, Shanghai, China Global Delivery China Center, Hewlett-Packard, Shanghai, China Open64 Tutorial - An Introduction
Goal • A better “DU manager” • Factored UD chain • Reduced traversing overhead • Keeping alias information • Handle both direct and indirect access • Eliminate ‘incomplete DU/UD chain’ • Easy to use • STL-style iterator to traverse the DU/UD chain • A flexible Infrastructure • Available from H WHIRL to L WHIRL • Lightweight, demand-driven • Precise and updatable Open64 Tutorial - An Introduction
PHI Placement • For SCF, φ nodes are mapped on the root WN • For GOTO-LABEL, φ nodes are placed on the LABEL Open64 Tutorial - An Introduction 4/1/2014 33
Thank you! Open64 Tutorial - An Introduction