1 / 8

GPU in HPC

GPU in HPC. Scott A. Friedman friedman@ats.ucla.edu ATS Research Computing Technologies. First of all…. Double Precision is coming! GPU: late 07 or early 08 (nvidia) Will be half speed – word on street At G80 speed, that equals 175Gflop Cell HPC: summer 08

drea
Download Presentation

GPU in HPC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPU in HPC Scott A. Friedman friedman@ats.ucla.edu ATS Research Computing Technologies

  2. First of all… • Double Precision is coming! • GPU: late 07 or early 08 (nvidia) • Will be half speed – word on street • At G80 speed, that equals 175Gflop • Cell HPC: summer 08 • First appears in LANL Roadrunner • 5x increase to 100Gflop IDRE GPU Lunch

  3. Hardware • Remember! • GPUs are for graphics (graphics processing unit) • Think data parallelism! • Must hide memory latency • Lots of computation – ‘arithmetic intensity’ • Low latency memory is precious resource • Limitations • Regs zeroed, minimal shared/static data, no r-m-w buffers • Varying latencies: dependant on memory type accessed • Designed for independent operations (legacy of graphics) • Lots of gotchas that will kill performance • Hardware constantly changing • Current generation • Proprietary architectures • NVIDIA G80, 128 ALUs, 350Gflop SP IDRE GPU Lunch

  4. Programming Model • Streaming • Elements (array) processed by a kernel (function) • Sounds like a SIMD vector processor • Not exactly, often term SPMD used (P=program) • No index ops on streams • Input stream(s) -> compute -> output stream • No dependencies between stream elements • CUDA relaxes this somewhat • Experimentation required • Balancing essential • Compute rather than move data • Maximize use of precious low latency high bandwidth memory • Cover latencies with as much computation as possible • High arithmetic intensity, you will hear this a lot! • Often better to re-compute than cache data • Avoid code that is memory bound • Memory access progressing much slower than # of ALUs • Better to batch memory moves into large transfers • Complex memory access rules have major impact on performance IDRE GPU Lunch

  5. Tools • Cell SDK • Direct access to the hardware • Very low level • CUDA (nvidia >8xxx) • C API – provides scalar execution model (with caveats) • Low level, think of MPI? Certain amount of hardware abstraction • User maps problem domain to processing units and memory hierarchy • Re-imagining of graphics hardware to programming concepts (e.g. threads, arrays) • GLSL, graphics tools, even lower level but not as necessary now • Kernel is : 1,2,3D Grid : Blocks : Threads • Threads within block can communicate via on chip shared memory and synchronize • Blocks are independent! • No communication between blocks • No execution ordering or concurrency guarantees • Free but specific to nvidia hardware (will hide future architecture changes) • CTM (amd/ati) • Similar, but lower level than CUDA • RapidMind • Integrate into C++ code • Higher level abstractions, think OpenMP? • SPMD oriented: e.g. streams and kernels, more restrictive than CUDA • Portable? • Let the experts do the mapping to memory hierarchy • Several back-ends supported, Cell, GPUs, Multicore CPUs • Allows tuning to specific hardware • Not free • Brook, Sh • Opensource tools • Sh is precursor to Rapidmind kit IDRE GPU Lunch

  6. Resources • One stop shopping • http://www.gpgpu.org • More good stuff • http://www.rapidmind.com/resources.php • Great survey paper • http://graphics.idav.ucdavis.edu/publications/print_pub?pub_id=907 • Cell HPC presentation • http://www.power.org/resources/devcorner/cellcorner/hpcspe.pdf • Siggraph 2007 gpgpu course – very good • http://www.gpgpu.org/s2007/ • IBM Cell • http://www.ibm.com/developerworks/power/cell/ • NVIDIA • http://developer.nvidia.com/object/cuda.html • AMD/ATI • http://ati.amd.com/technology/streamcomputing/index.html • Rapidmind • http://developer.rapidmind.com • Google is your friend, of course IDRE GPU Lunch

  7. Conclusions • This is the future of the highest performance codes • GPU, Cell or Larabee-sque multi-core • Industry is scaling cores not clocks • Industry contacts share that customers are 'in denial' and need to get on board. • Programming is going to get whole lot more complex • Memory hierarchies • Load and system balancing • More and more doing it – fewer and fewer who are any good at it • Education! • Mapping problem domains to these architectures is still evolving • Lots of clever solutions to lots of problems • Domain and algorithm level • Tools are currently pretty weak • Industry appears to be aware of this – not just the market opportunity • Hopefully • APIs will insolate us from variety and evolution of hardware IDRE GPU Lunch

  8. Thank you • Questions? • Please feel free to contact me • ATS has several resources that you can access to try some of these things out • Sony Playstation3, Cell SDK • nVidia 8800GTX, CUDA, Rapidmind IDRE GPU Lunch

More Related