NCI Report: Zephyr

NCI Report: Zephyr PLDI NCI Tutorial University of Virginia Princeton University

Zephyr Goals • Goal • Deliver high-quality, language-neutral tools for rapidly constructing compilers for experimental computing systems research • How • Provide specification languages and processors to automatically generate key compiler components • Don’t write code, write specifications! PLDI NCI Tutorial

Zephyr Compilers PLDI NCI Tutorial

Zephyr Building Blocks • ASDL: Abstract Syntax Description Language • VPO: Very Portable Optimizer • CSDL: Computer System Description Language PLDI NCI Tutorial

ASDL: Abstract Syntax Description Language PLDI NCI Tutorial

ASDL • ASDL makes it easy to communicate complex recursive data structures • ASDL and its tools provide • Concise descriptions of tree-like structures, including ASTs and compiler (IRs) • Automatic generation of data structure implementations and pickling functions for C, C++, Java, Standard ML, and Haskell. • Graphical browsing and editing of data structures on disk. PLDI NCI Tutorial

ASDL • For more information about ASDL see: • Give reference here • Give URL here PLDI NCI Tutorial

VPO: Very Portable Optimizer • VPO is a retargetable optimizer that operates on a low-level, machine-independent representation called RTLs (register transfer lists) • VPO is retargeted by providing a machine description (MD) of the target machine, and revising a few machine-dependent routines • VPO is small, easily extended, and extremely effective PLDI NCI Tutorial

History Lesson • PO developed in 1981 • Pioneered use of RTLs • Demonstrated ability to do optimizations on low-level representation • Development split in 1982 • gcc development • Richard Stallman and Len Tower • VPO development • Many people at Uva and a few industrial labs PLDI NCI Tutorial

Register Transfer Lists • Based on Bell and Newell's ISP notation • Machine-independent representation of a machine-dependent operation • Algorithms that manipulate RTLs are machine-independent PLDI NCI Tutorial

Register Transfer Lists • While assembly language notations may very, RTLs are very similar across architectures Example RTL Machineadd %o1,%o2,%o2 SPARCaddu $10,$10,$9 MIPSar 10,9 IBM in RTL each operation would be represented r[10] = r[10] + r[9]; PLDI NCI Tutorial

RTLs • The form of RTLs are fixed • dst = src ; dst = src ; dst = src … • The individual register transfers are performed in parallel • Example • r[1] = r[1] + r[2] ; NZ = r[1] + r[2] ? 0 • VPO provides machine-independent primitives for operating on and manipulating RTLs • Obtain the sources and destinations • Obtain the memory locations read and written • Obtain the type of instruction (arithmetic, branch, control transfer, etc.) PLDI NCI Tutorial

RTLs • Think of RTL as a machine-independent assembly language • For a machine X, each RTLx describes an instruction in X’s instruction set (may be a synthetic instruction) • RTLx should specify • instruction’s input and outputs • the transformation the instruction makes on the machine state • VPO uses this information to compute a dataflow graph PLDI NCI Tutorial

Compilation with VPO You supply the front end and a simple code generator, we supply an optimizing back end PLDI NCI Tutorial

Generating RTLX • Translate IL ops to semantically equivalent sequences of instructions for the target machine • Generate RTL representation of instructions, not assembly language • Do not worry about code quality • Perform naïve, straightforward translation • Expose all computations (even effective address computations) to VPO • Use virtual or pseudo registers for temporaries • VPO handles activation record and data placement PLDI NCI Tutorial

Generating RTLx The C code K = I + 1; Ô IL SPARC RTL ADDR int K Ô r[33]=r[14]+K.; ADDR int I Ô r[34]=r[14]+I.; @ int Ô r[35]=M[r[34]]; r[34] CON int 1 Ô r[36]=1; + int Ô r[37]=r[35]+r[36]; r[35]:r[36] = int Ô M[r[33]]=r[37]; r[33]:r[37] PLDI NCI Tutorial

VPO design rationale • All "traditional" optimizations performed at the machine-level on a single representation—RTL • most optimizations are machine-dependent • better code is produced • instruction selection can be performed on demand • avoids phase ordering problems • simplifies implementation of optimizations • easier to accommodate emerging architectures • "plug and play" structure PLDI NCI Tutorial

RTLs in VPO • VPO optimization algorithm • repeat apply code-improving transformationuntil fixed-point reached or exhausted registers • Maintaining two invariants • Semantic invariant (S) • Observable behavior of program unchanged (according to RTL semantics) • Machine invariant (M) • Every RTL equivalent to one machine instruction PLDI NCI Tutorial

VPO code improvements • Each code-improving transformation is • machine-level, but • machine-independent • Any semantics-preserving transformation is OK • Preserve machine invariant (M) using machine description; • for each new RTL produced, ask MD if OK • if any is not target machine instruction, roll back transformation PLDI NCI Tutorial

Register assignment and allocation Common subexpression elimination Induction variable elimination Code motion Constant propagation Copy propagation Memory access coalescing Recurrence detection Instruction scheduling Dead code elimination Constant folding Loop unrolling Branch minimization Evaluation order determination Code improvement catalog PLDI NCI Tutorial

VPO Optimizations • Common subexpression elimination • Davidson, J. W. and Fraser, C. W., ‘Eliminating Redundant Object Code,’ in Conference Record of the Ninth Annual ACM Symposium on Principles of Programming Languages, January 1982, pp. 128–132. • Evaluation Order Determination • Davidson, J. W. , ‘A Retargetable Instruction Reorganizer’, in Proceedings of the SIGPLAN ‘86 Symposium on Compiler Construction, 21(7), June 1986, pp. 23–241. PLDI NCI Tutorial

VPO Optimizations • Link-time optimization • Benitez, M. E. and Davidson, J. W., ‘A Portable Global Optimizer and Linker’, in Proceedings of the SIGPLAN ‘88 Symposium on Programming Language Design and Implementation, June 1988, pp. 329—338. • Memory access coalescing • Davidson, J. W. and Jinturkar, S., ‘Memory Access Coalescing: A Technique for Eliminating Redundant Memory Accesses’, in Proceedings of the SIGPLAN ‘94 Symposium on Programming Language Design and Implementation, Orlando, FL, June 1994, pp. 186— 195. PLDI NCI Tutorial

VPO Optimizations • Code Motion • Benitez, M. E. and Davidson, J. W., ‘The Advantages of Machine-Dependent Global Optimization’, in Proceedings of the 1994 Conference on Programming Languages and Systems Architectures, Zurich, Switzerland, March 1994, pp. 105–124. • Loop Unrolling • Jinturkar, S. and Davidson, J. W., ‘Improving Instruction-level Parallelism by Loop Unrolling and Dynamic Memory Disambiguation’, in Proceedings of the 28th Annual IEEE/ACM International Symposium on Microarchitecture, Ann Arbor, MI, November 1995, pp. 125–132. PLDI NCI Tutorial

VPO Optimizations • Branch mininization • F. Mueller and D. B. Whalley, ‘Avoiding Conditional Branches by Code Replication’ in Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, June 1995, pages 56-66. • M. Yang, G. Uh, and D. Whalley, ‘Improving Performance by Branch Reordering’ in Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation, June 1998, pages 130-141. PLDI NCI Tutorial

VPO Optimizations • Recurrence detection and optimization • Benitez, M. E. and Davidson, J. W., ‘Code Generation for Streaming: an Access/Execute Mechanism’, in Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991, pp. 132–141. PLDI NCI Tutorial

Building VPO PLDI NCI Tutorial

CSDL: Computing System Description Language • Computing System Description Language • Modular system of components • Allows applications to customize a description • Easily extensible for adding new details • Reusable/application independent PLDI NCI Tutorial

CSDL PLDI NCI Tutorial

Zephyr Compilers • EDGSUIF-to-VPO Compiler • Five targets (SPARC, Pentium, Alpha, MIPS, SimpleScalar) PLDI NCI Tutorial

Zephyr Compilers • EDG-to-VPO C++ compiler • Funded by Edison Design group • Targeted to SPARC only • Compiles all benchmark suites (SPEC, PGI, lcc) • Code generator (translator from EDG intermediate representation to RTLs) provided as a literate program PLDI NCI Tutorial

Zephyr Compilers • lcc-to-VPO C compiler • Targeted to SPARC, X86, MIPS, ALPHA, and SimpleScalar • Code generators (translators from LIRA to target-machine RTLs) provided as literate programs • Currently producing good code, some optimizations are not fully implemented/debugged PLDI NCI Tutorial

SPEC results for SPARC PLDI NCI Tutorial

Acknowledgements • This work has been funded by: • Defense Advanced Research Projects Agency • National Science Foundation • Panasonic AVC Labs • Edison Design Group PLDI NCI Tutorial

Afternoon Schedule PLDI NCI Tutorial

Using Zephyr for Programming Language Research Kevin Scott University of Virginia

Overview • Zephyr organization and philosophy • VPO code generation interfaces • Adding a new front-end to Zephyr: • Using the Lira intermediate representation • With a custom code expander using the VPO code generation interfaces • Language related issues in retargeting Zephyr • Q & A PLDI NCI Tutorial

What is Zephyr? • Set of tools for generating and optimizing RTL programs • VPO (Very Portable Optimizer) • SPARC, Alpha, x86, MIPS, SimpleScalar (PISA) • Code Expanders • Turn a front-end’s IR into RTLs • Glue for hooking front-ends up to VPO • VPO code generation interfaces • Lira IR • Debugging tools • VET – interface for controlling and visualizing VPO transformations • vpoiso – isolates optimizer bugs PLDI NCI Tutorial

National Compiler Infrastructure PLDI NCI Tutorial

Why use Zephyr? • You’re a language researcher • Easy to hook a front-end up to VPO • Relatively little effort required to get multiple targets • VPO is a very good optimizer • Wide range of existing operations • Leverage work of others contributing new optimizations to VPO • Let’s you concentrate on front-end issues • Less work than writing a VPO-quality optimizer yourself PLDI NCI Tutorial

Zephyr Organization Front Ends VPCC EDG lcc SUIF CVM code expanders Lira code expanders EDG code expanders SPARC MIPS SPARC SPARC Alpha x86 x86 MIPS VPOi and VPOasm VPO PLDI NCI Tutorial

Four Front Ends • VPCC – A K&R C compiler • IR is code for a C virtual machine (CVM) • Deprecated in favor of lcc front-end • EDG – Edison Design Group C/C++ • Very flexible IR • Lcc – Retargetable C compiler • Simple backend emits Lira, an IR based on lcc trees • SUIF 2.1 • High level optimizations and analyses • suif2lira pass transforms SUIF IR into Lira PLDI NCI Tutorial

Code Expanders • CVM Code Expanders • SPARC, x86, MIPS • Generate encoded RTL files directly – don’t use VPOi or VPOasm • EDG Code Expanders • SPARC • First expander to use VPOi and VPOasm interfaces PLDI NCI Tutorial

Lira Code Expanders • Targets • SPARC • X86 • Alpha • MIPS32 • MIPS64 and SimpleScalar (PISA) • Input Lira code specialized for target • Output encoded RTLs for VPO • All use the VPOi and VPOasm interfaces PLDI NCI Tutorial

VPOi • VPOi provides a C interface for: • Creating RTLs • Sending RTLs to VPO for optimization • Abstracts away specifics of: • RTL representation • How RTLs are sent to VPO • RTL creation routines can be semi-automatically generated from a machine specification PLDI NCI Tutorial

VPOasm • VPOasm provides a C interface for sending assembly language statements to VPO. • Allows a code expander to: • Change segments • Define symbols • Initialize storage locations • Specify alignments for code or data PLDI NCI Tutorial

More on VPOi and VPOasm • Why use these interfaces? • Simpler than writing out VPO encoded RTL files manually. • Can get some of the implementation for free if doing a new target architecture. • Allows us to change RTL and assembly language representations w/o fouling you up. Much. • Reference manual for VPOi and VPOasm: • http://www.cs.virginia.edu/zephyr/vpoi PLDI NCI Tutorial

VPOi and VPOasm caveats • Interfaces are written in C. • Bad if you’re writing a code expander in languages with no mechanism for calling C functions. • Interfaces are relatively rigid. • Suppose you want to communicate something to the optimizer that doesn’t look like an RTL or assembly language. • Interfaces have only been tested on C/C++ front ends. • Might have to change to accommodate new language features… PLDI NCI Tutorial

Lira • Simple IR based on lcc trees • Targets a stack-oriented virtual machine • Two types of entities in a Lira file: • Instructions • Directives PLDI NCI Tutorial

NCI Report: Zephyr