1 / 111

The SGI Pro64 Compiler Infrastructure

The SGI Pro64 Compiler Infrastructure. - A Tutorial. Guang R. Gao (U of Delaware) J. Dehnert (SGI) J. N. Amaral (U of Alberta) R. Towle (SGI). Acknowledgement. The SGI Compiler Development Teams The MIPSpro/Pro64 Development Team University of Delaware CAPSL Compiler Team

noreen
Download Presentation

The SGI Pro64 Compiler Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The SGI Pro64 Compiler Infrastructure - A Tutorial Guang R. Gao (U of Delaware)J. Dehnert (SGI) J. N. Amaral (U of Alberta)R. Towle (SGI)

  2. Acknowledgement The SGI Compiler Development Teams • The MIPSpro/Pro64 Development Team University of Delaware • CAPSL Compiler Team These individuals contributed directly to this tutorial A. Douillet(Udel)F. Chow (Equator) S. Chan (Intel) W. Ho (Routefree) Z. Hu (Udel) K. Lesniak (SGI) S. Liu (HP) R. Lo (Routefree) S. Mantripragada (SGI) C. Murthy(SGI) M. Murphy(SGI) G. Pirocanac (SGI) D. Stephenson (SGI) D. Whitney (SGI) H. Yang (Udel)

  3. What is Pro64? • A suite of optimizing compiler tools for Linux/ Intel IA-64 systems • C, C++ and Fortran90/95 compilers • Conforming to the IA-64 Linux ABI and API standards • Open to all researchers/developers in the community • Compatible with HP Native User Environment

  4. Who Might Want to Use Pro64? • Researchers: test new compiler analysis and optimization algorithms • Developers : retarget to another architecture/system • Educators: a compiler teaching platform

  5. Outline • Background and Motivation • Part I: An overview of the SGI Pro64 compiler infrastructure • Part II: The Pro64 code generator design • Part III: Using Pro64 in compiler research & development • SGI Pro64 support • Summary

  6. PART I: Overview of the Pro64 Compiler

  7. Outline • Logical compilation model and component flow • WHIRL Intermediate Representation • Inter-Procedural Analysis (IPA) • Loop Nest Optimizer (LNO) and Parallelization • Global optimization (WOPT) • Feedback • Design for debugability and testability

  8. Logical Compilation Model driver (sgicc/sgif90/sgiCC) front end + IPA (gfec/gfecc/mfef90) back end (be, as) linker (ld) WHIRL (.B/.I) obj (.o) Src (.c/.C/.f) a.out/.so Data Path Fork and Exec

  9. Components of Pro64 Front end Interprocedural Analysis and Optimization Loop Nest Optimization and Parallelization Global Optimization Code Generation

  10. Data Flow Relationship Between Modules -O3 -IPA LNO Local IPA Main IPA Lower to High W. .B Inliner gfec .I lower I/O gfecc (only for f90) .w2c.c WHIRL C f90 .w2c.h .w2f.f WHIRL fortran -O0 Take either path Lower all CG Very high WHIRL -phase: w=off High WHIRL Main opt Lower Mid W -O2/O3 Mid WHIRL Low WHIRL

  11. Front Ends • C front end based on gcc • C++ front end based on g++ • Fortran90/95 front end from MIPSpro

  12. Intermediate Representation IR is called WHIRL • Tree structured, with references to symbol table • Maps used for local or sparse annotation • Common interface between components • Multiple languages, multiple targets • Same IR, 5 levels of representation • Continuous lowering during compilation • Optimization strategy tied to level

  13. IPA Main Stage Analysis • alias analysis • array section • code layout Optimization (fully integrated) • inlining • cloning • dead function and variable elimination • constant propagation

  14. IPA Design Features • User transparent • No makefile changes • Handles DSOs, unanalyzed objects • Provide info (e.g. alias analysis, procedure properties) smoothly to: • loop nest optimizer • main optimizer • code generator

  15. Loop Nest Optimizer/Parallelizer • All languages (including OpenMP) • Loop level dependence analysis • Uniprocessor loop level transformations • Automatic parallelization

  16. Loop Level Transformations • Loop Fission • Loop Fusion • Loop Unroll and Jam • Loop Interchange • Based on unified cost model • Heuristics integrated with software pipelining • Loop vector dependency info passed to CG • Loop Peeling • Loop Tiling • Vector Data Prefetching

  17. Parallelization • Automatic Array privatization Doacross parallelization Array section analysis • Directive based OpenMP Integrated with automatic methods

  18. Global Optimization Phase • SSA is unifying technology • Use only SSA as program representation • All traditional global optimizations implemented • Every optimization preserves SSA form • Can reapply each optimization as needed

  19. Pro64 Extensions to SSA • Representing aliases and indirect memory operations (Chow et al, CC 96) • Integrated partial redundancy elimination (Chow et al, PLDI 97; Kennedy et al, CC 98, TOPLAS 99) • Support for speculative code motion • Register promotion via load and store placement (Lo et al, PLDI 98)

  20. Feedback Used throughout the compiler • Instrumentation can be added at any stage • Explicit instrumentation data incorporated where inserted • Instrumentation data maintained and checked for consistency through program transformations.

  21. Design for Debugability (DFD) and Testability (DFT) • DFD and DFT built-in from start • Can build with extra validity checks • Simple option specification used to: • Substitute components known to be good • Enable/disable full components or specific optimizations • Invoke alternative heuristics • Trace individual phases

  22. Where to Obtain Pro64 Compiler and its Support • SGI Source download http://oss.sgi.com/projects/Pro64/ • University of Delaware Pro64 Support Group http://www.capsl.udel.edu/~pro64 pro64@capsl.udel.edu

  23. PART II Overview of The Pro64 Code Generator

  24. Outline • Code generator flow diagram • WHIRL/CGIR and TARG-INFO • Hyperblock formation and predication (HBF) • Predicate Query System (PQS) • Loop preparation (CGPREP) and software pipelining • Global and local instruction scheduling (IGLS) • Global and local register allocation (GRA, LRA)

  25. Flowchart of Code Generator WHIRL Control Flow Opt II EBO WHIRL-to-TOP Lowering EBO: Extended basic block optimization peephole, etc. CGIR: Quad Op List IGLS: pre-pass GRA, LRA, EBO IGLS: post-pass Control Flow Opt Control Flow Opt I EBO Hyperblock Formation Critical-Path Reduction PQS: Predicate Query System Code Emission Process Inner Loops: unrolling, EBO Loop prep, software pipelining

  26. From WHIRL to CGIR An Example ST aa int *a; int i; int aa; aa = a[i]; T1 = sp + &a; T2 = ld T1 T3 = sp + &i; T4 = ld T3 T5 = sxt T4 T6 = T5 << 2 T7 = T6 T8 = T2 + T7 T9 = ld T8 T10 = sp + &aa := st T10 T9 LD + a * CVTL32 4 i (a) Source (b) WHIRL (c) CGIR

  27. Code Generation Intermediate Representation (CGIR) • TOPs (Target Operations) are “quads” • Operands/results are TNs • Basic block nodes in control flow graph • Load/store architecture • Supports predication • Flags on TOPs (copy ops, integer add, load, etc.) • Flags on operands (TNs)

  28. From WHIRL to CGIR Cont’d • Information passed • alias information • loop information • symbol table and maps

  29. The Target Information Table (TARG_INFO) Objective: • Parameterized description of a target machine and system architecture • Separates architecture details from the compiler’s algorithms • Minimizes compiler changes when targeting a new architecture

  30. The Target Information Table (TARG_INFO) Cont’d • Based on an extension of Cydra tables, with major improvements • Architecture models have already targeted: • Whole MIPS family • IA-64 • IA-32 • SGI graphics processors (earlier version)

  31. Flowchart of Code Generator WHIRL Control Flow Opt II EBO WHIRL-to-TOP Lowering EBO: Extended basic block optimization peephole, etc. CGIR: Quad Op List IGLS: pre-pass GRA, LRA, EBO IGLS: post-pass Control Flow Opt Control Flow Opt I EBO Hyperblock Formation Critical-Path Reduction PQS: Predicate Query System Code Emission Process Inner Loops: unrolling, EBO Loop prep, software pipelining

  32. Hyperblock Formation and Predicated Execution • Hyperblock single-entry multiple-exit control-flow region: • loop body, hammock region, etc. • Hyperblock formation algorithm • Based on Scott Mahlke’s method [Mahlke96] • But, less aggressive tail duplication

  33. Hyperblock Formation Algorithm Region Identification • Hammock regions • Innermost loops • General regions (path based) • Paths sorted by priorities (freq., size, length, etc.) • Inclusion of a path is guided by its impact on resources, scheduling height, and priority level • Internal branches are removed via predication • Predicate reuse Block Selection Tail Duplication If Conversion Objective: Keep the scheduling height close to that of the highest priority path.

  34. Hyperblock Formation - An Example 1 1 aa = a[i]; bb = b[i]; switch (aa) { case 1: if (aa < tabsiz) aa = tab[aa]; case 2: if (bb < tabsiz) bb = tab[bb]; default: ans = aa + bb; 4 2 4 2 1 5 4,5 5 2 6’ 6 6 6,7 7’ 8 7 7 8’ 8 8 H1 H2 (a) Source (c) Hyperblock formation with aggressive tail duplication (b) CFG

  35. Hyperblock Formation - An Example Cont’d 1 1 1 4 2 4 2 4 2 H1 5 5 5 6’ 6 6 6 7’ 7 7 7 8’ 8 H2 8 H1 H2 8 (b) Hyperblock formation with aggressive tail duplication (c) Pro64 hyperblock formation (a) CFG

  36. Features of the Pro64 Hyperblock Formation (HBF) Algorithm • Form “good” vs. “maximal” hyperblocks • Avoid unnecessary duplication • No reverse if-conversion • Hyperblocks are not a barrier to global code motion later in IGLS

  37. Predicate Query System (PQS) • Purpose: gather information and provide interfaces allowing other phases to make queries regarding the relationships among predicate values • PQS functions (examples) BOOL PQSCG_is_disjoint (PQS_TN tn1, PQS_TN tn2) BOOL PQSCG_is_subset (PQS_TN_SET& tns1, PQS_TN_SET& tns2)

  38. Flowchart of Code Generator WHIRL Control Flow Opt II EBO WHIRL-to-TOP Lowering EBO: Extended basic block optimization peephole, etc. CGIR: Quad Op List IGLS: pre-pass GRA, LRA, EBO IGLS: post-pass Control Flow Opt Control Flow Opt I EBO Hyperblock Formation Critical-Path Reduction PQS: Predicate Query System Code Emission Process Inner Loops: unrolling, EBO Loop prep, software pipelining

  39. Loop Preparation and Optimization for Software Pipelining • Loop canonicalization for SWP • Read/Write removal (register aware) • Loop unrolling (resource aware) • Recurrence removal or extension • Prefetch • Forced if-conversion

  40. Pro64 Software Pipelining Method Overview • Test for SWP-amenable loops • Extensive loop preparation and optimization before application [DeTo93] • Use lifetime sensitive SWP algorithm [Huff93] • Register allocation after scheduling based on Cydra 5 [RLTS92, DeTo93] • Handle both while and do loops • Smooth switching to normal scheduling if not successful.

  41. Pro64 Lifetime-Sensitive Modulo Scheduling for Software Pipelining Features • Try to place an op ASAP or ALAP to minimize register pressure • Slack scheduling • Limited backtracking • Operation-driven scheduling framework Compute Estart/Lstart for all unplaced ops Choose a good op to place into the current partial schedule within its Estart/Lstart range yes Register allocate Succeed no done Eject conflicting Ops

  42. Flowchart of Code Generator WHIRL Control Flow Opt II EBO WHIRL-to-TOP Lowering EBO: Extended basic block optimization peephole, etc. CGIR: Quad Op List IGLS: pre-pass GRA, LRA, EBO IGLS: post-pass Control Flow Opt Control Flow Opt I EBO Hyperblock Formation Critical-Path Reduction PQS: Predicate Query System Code Emission Process Inner Loops: unrolling, EBO Loop prep, software pipelining

  43. Integrated Global Local Scheduling (IGLS) Method • The basic IGLS framework integrates global code motion (GCM) with local scheduling [MaJD98] • IGLS extended to hyperblock scheduling • Performs profitable code motion between hyperblock regions and normal regions

  44. IGLS Phase Flow Diagram Hyperblock Scheduling (HBS) Block Priority Selection Motion Selection Target Selection Global Code Motion (GCM) Local Code Scheduling (LCS)

  45. Advantages of the Extended IGLSMethod - The Example Revisited 1 • Advantages: • No rigid boundaries between hyperblocks and non-hyperblocks • GCM moves code into and out of a hyperblock according to profitability 1 4 2 H1 4 2 H1 5 5 6 6 7 7 8’ 8 H2 H2 H3 8 (a) Pro64 hyperblock (b) Profitable duplication

  46. Software Pipelining vsNormal Scheduling a SWP-amenable loop candidate ? No Yes IGLS Inner loop processing software pipelining GRA/LRA Failure/not profitable IGLS Code Emission Success

  47. Flowchart of Code Generator WHIRL Control Flow Opt II EBO WHIRL-to-TOP Lowering EBO: Extended basic block optimization peephole, etc. CGIR: Quad Op List IGLS: pre-pass GRA, LRA, EBO IGLS: post-pass Control Flow Opt Control Flow Opt I EBO Hyperblock Formation Critical-Path Reduction PQS: Predicate Query System Code Emission Process Inner Loops: unrolling, EBO Loop prep, software pipelining

  48. Global and Local Register Allocation(GRA/LRA) From prepass IGLS • LRA-RQ provides an estimate of local register requirements • Allocates global variables using a priority-based register allocator [ChowHennessy90,Chow83, Briggs92] • Incorporates IA-64 specific extensions, e.g. register stack usage GRA LRA Register Request LRA-RQ Priority Based Register Allocation with IA-64 Extensions LRA To postpass IGLS

  49. Local Register Allocation (LRA) • Assign_registers using reverse linear scan • Reordering: depth-first ordering on the DDG Assign_Registers failed succeed Fix_LRA first time Instruction reordering Spill global spill local

  50. Future Research Topics for Pro64 Code Generator • Hyperblock formation • Predicate query system • Enhanced speculation support

More Related