1 / 23

Compiling for VIRAM

Compiling for VIRAM. Dave Judd Kathy Yelick Rich Fromm David Martin Computer Science Division UC Berkeley. VIRAM/Cray Compiler vcc. VIRAM/Cray vectorizing compiler Production compiler Used on the T90, C90, as well as the T3D and T3E Being ported (by SGI/Cray) to the SV2 architecture

keelia
Download Presentation

Compiling for VIRAM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiling for VIRAM Dave Judd Kathy Yelick Rich Fromm David Martin Computer Science Division UC Berkeley

  2. VIRAM/Cray Compiler vcc • VIRAM/Cray vectorizing compiler • Production compiler • Used on the T90, C90, as well as the T3D and T3E • Being ported (by SGI/Cray) to the SV2 architecture • Has C, C++, and Fortran front-ends (focus on C) • Extensive vectorization capability • VIRAM code generator based on new SV2 code generator • SV2 code gen being developed in parallel w/ VIRAM • SV2 vector architecture similar to VIRAM • SV2 scalar similar to T90 with more registers

  3. VIRAM/Cray Compiler Status Frontends Optimizer Code Generators • MIPS backend developed by last retreat • Compiled code for a commercial test suite • VIRAM vector support developed since last retreat • Compiles & executes a commercial test suite • Compiles and executes a Cray vector test suite • Remaining issues with scheduler and memory consistency model C T3D/T3E C++ PDGCS C90/T90 Fortran95 SV2/VIRAM

  4. Codegen/optimizer issues for VIRAM • Variable virtual processor width (VPW) • Variable vector register length (MVL) • Vector flag registers treated as 1 bit wide vector register • Multiple base, incr, stride regs. + autoincrement • Fixed point arithmetic (saturating add, etc.) • Memory Consistency

  5. Virtual Processors (vl) VP0 VP1 VPvl-1 vr0 vr1 Data Registers vr31 vpw Vector Architectural State • Number of VPs given by the Vector Length register vl • Width of each VP given by the register vpw • vpw is one of {8b,16b,32b,64b} • Maximum vector length is given by a read-only register mvl • mvl depends on implementation and vpw: {NA,128,64,32} in VIRAM-1

  6. Generating Code for Variable VPW • Strategy: vectorizer determines minimum correct vpw for each loop nest • Vectorizer assumes vpw=64 initially • At end of vectorization, discard vectorized copy of loop if greatest width encountered is less than 64 and start vectorization over with new vpw. • Code gen checks vpw for each loop nest. • Limitation: a single loop nest will run at the speed of the widest type. • Reason: simplicity & performance of the common case • No attempt to split/combine loops based on vpw

  7. Generating Code for Variable MVL • Maximum vector length is not specified in IRAM ISA. • However, compiler assumes mvl at compile time • mvl based on vpw • mvl assumption dependent on VIRAM-1 hardware implementation • Recompiling required for future hardware versions if mvl changes • MVL knowledge useful for code gen and vectorizer: • register spilling • short loop vectorization • length-dependent vectorization ( and may eliminate safe vector length computation at run time) for (i = 0; i < n; i=++) a[i] = a[i+32]

  8. Vector Flag Registers • Vector flag (mask) register treated as vector register • Bit width of 1 • Flag registyer under control of vector length • Can spill/reload directly to memory • Optimizer and code gen issues to handle correctly

  9. Multiple Base, Incr, Stride Registers • Dedicated registers for vector memory references • 16 vbase, 8 vinc and 8 vstride registers • optional automatic increment of base register • Vectorizer/Codegen strategy: • Changed from computing base address each time thru loop to incrementing base address by vl *stride*multiplier • Define compiler temporary for each base address • Teach codegen to assign vbase, vinc and vstride registers as needed. • Trick code gen into handling multiple results for single vector load/store instruction. • Results in very clean vector loops with only vector instructions in inner loop + vl computation.

  10. C Compiler Testing • “vector” regression test suite (CRAY) • Specifically tests for vectorization • Compares vector and scalar results • Easy to isolate problems • Status: • 56 of 62 tests pass • Some minor numerical differences • 3 failures w/ wrong answers • 1 failure causes assembler abort on bad instruction (caused by vinc autoincrement feature)

  11. C Compiler Testing (cont.) • C regression test suite (industry standard suite) • Scalar emphasis, C conformance • All tests pass except: • errors with functions returning a structure larger than 16 bytes • errors with long double constants (128 bit floating point)

  12. What Essential Features Remain • Finish instruction scheduler • Implement sync strategy • Support -n32 ABI • Double / long double

  13. Instruction Scheduler • Instruction scheduler working, but needs: • Functional unit layout for VIRAM • Instruction latency and busy time for VIRAM • Support for chaining of vector instructions, including mask registers • Scheduler responsible for sync processing • vector - vector sync analysis is working • vector - scalar & scalar - vector analysis needed • synch instr. currently comments in assembler output

  14. Chaining vadd.s $vr1,$vr3,$vr4 vabs.s $vr2,$vr1 With chaining: 0 1 2 3 4 5 6 7 . . . V1[0] R X X X X W V1[1] R X X X X W V1[2] R X X X X W V1[3] R X X X X W V2[0] R X X W V2[1] R X X W V2[2] R X X W V2[3] R X X W

  15. Convert from -64 to -n32 ABI • Pointer and long type revert to 32 bits from 64 • VIRAM tools were n32 originally • Switched to -64 to accommodate compiler, to match sv2 • Revert to n32 to match vector addressing hardware • Change will allow some gather/scatter loops to execute faster (vpw=32 instead of vpw=64)

  16. C++ • All components being generated now • “Include” files and libC library differences? SGI / Cray • C++ testing on SV2 simulator now • Testing/ problem isotation needed

  17. Fortran • Fortran 95 frontend • FCD (fortran character descriptor) code gen support needed for VIRAM • Differences between IRIX and UNICOS libraries for I/O and array intrinsics • Testing needed

  18. Other Future Compiler Features ? • VIRAM machine “target” • Support for speculative execution • Support Cray additions: • peephole optimizer • vector loop unrolling/ tiling • Compiler extensions for fixed point hardware

  19. Vector functions • Define calling sequence conventions • Must be coded in assembler • Take advantage of C versions of Cray routines • Needed for some key benchmarks?

  20. Memory consistency • Sync instructions: SaV VaS VaV vp RaW WaR WaW

  21. VIRAM Tools • vas: assembler • vdis: disassembler • vsim-isa: simulator • vsim-db: debugger • vsim-p: performance simulator • vsim-sync:memory consistency simulator

  22. vsim-sync • Intended for debugging and optimizing sync’s • Tells you when there is a data hazard (sync needed) • Tells you when a sync executed that didn’t prevent a hazard; • sync may not be needed • according to dynamic execution • sync may be needed on some other execution path

  23. Summary • vcc is a reasonably robust compiler for VIRAM • All of the basic compiler elements are present now • Need to prioritize remaining work • Finish and tune scheduler • Implement sync strategy • C++ • Fortran • IRAM target

More Related