1 / 14

NVIDIA’s Experience with Open64

Learn how NVIDIA utilizes Open64 for CUDA compilation, optimizations, and future developments to enhance GPU code performance. Explore the challenges and strategies implemented for efficient GPU code processing. Discover the benefits and insights gained from using Open64 in compiling CUDA applications for GPUs.

Download Presentation

NVIDIA’s Experience with Open64

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NVIDIA’s Experience with Open64 Mike MurphyNVIDIA

  2. Outline • Why Open64 • How we use Open64 • What we did to Open64 • Future work in Open64

  3. Compiling CUDA for GPUs • C/C++ CUDA • Application NVCC • GPU Code • GPU Code • CPU Code • executable

  4. Why Open64 • We had a low-level code generator for graphics codes, but for CUDA needed high-level optimization for C/C++ codes. own gcc open64

  5. Why Open64 • We had a low-level code generator for graphics codes, but for CUDA needed high-level optimization for C/C++ codes. own gcc open64 take too long

  6. Why Open64 • We had a low-level code generator for graphics codes, but for CUDA needed high-level optimization for C/C++ codes. own gcc open64 good long-term support take too long

  7. Why Open64 • We had a low-level code generator for graphics codes, but for CUDA needed high-level optimization for C/C++ codes. own gcc open64 good long-term support take too long best performance (kudos to PathScale)

  8. NVCC processing of GPU code cudafe C code for GPU nvopencc (Open64) ptx OCG object code

  9. Changes: Rehosting Open64 • Our compiler has to run on 32 & 64bit Linux, 32 & 64bit Windows, and Mac OS. • Main Open64 source tree is only for Linux. • This is an area where sharing our changes can help grow the user base by making it easier to port Open64. • For Windows we build using Cygwin’s MINGW

  10. Changes: Memory and registers • We don’t have a stack or fast memory • Therefore want to keep data in registers • Inline everything and optimize as much as possible • Try to keep small structs in registers by expanding struct copies into field copies (versus taking address and generating loop to do byte copy)

  11. Changes: Vector loads and stores • Coalesce adjacent loads and stores for performance • Do this in CG: • Iterate through ops, trying to add to vectors • Check for intervening kills • Change alignment and use dummy regs for padding if helps to create wider vector (e.g. may use 4-word vector for 3-word struct).

  12. Changes: 16bit optimization • Cheaper to use 16bit registers and operations • But C converts shorts to int. • So add pass in CG that converts back to 16bit: • Mark 16bit loads, stores, and converts • Propagate 16bit-ness forwards and backwards • Unmark 16bit-ness if cannot be 16bit • Change remaining registers and instructions to be 16bit.

  13. Future work • 1 person -> 4 people working with Open64 • New application TBA • Merging changes into trunk • Thanks to Sun Chan and Shin! • Investigating register pressure in WOPT • Want better control of register pressure during optimization • Investigating using other features (LNO, IPA, etc)

  14. Questions? mmurphy@nvidia.com http://www.nvidia.com/CUDA

More Related