1 / 15

Last time: Runtime infrastructure for hybrid (GPU-based) platforms Task scheduling

Last time: Runtime infrastructure for hybrid (GPU-based) platforms Task scheduling Extracting performance models at runtime Memory management Asymmetric Distributed Shared Memory

Download Presentation

Last time: Runtime infrastructure for hybrid (GPU-based) platforms Task scheduling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Last time: Runtime infrastructure for hybrid (GPU-based) platforms • Task scheduling • Extracting performance models at runtime • Memory management • Asymmetric Distributed Shared Memory StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines, Cédric Augonnet, Samuel Thibault, and Raymond Namyst. TR-7240, INRIA, March 2010. [link] An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems, Isaac Gelado, Javier Cabezas, John Stone, Sanjay Patel, Nacho Navarro, Wen-mei Hwu, ASPLOS’10 [pdf]

  2. Today: • Bridging runtime and language support • ‘Virtualizing GPUs’ Achieving a Single Compute Device Image in OpenCL for Multiple GPUs, Jungwon Kim, Honggyu Kim, Joo Hwan Lee, Jaejin Lee, PPoPP’11 [pdf] Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework, Vignesh T. Ravi et al., HPDC 2011

  3. Today: • Bridging runtime and language support • ‘Virtualizing GPUs’ Achieving a Single Compute Device Image in OpenCL for Multiple GPUs, Jungwon Kim, Honggyu Kim, Joo Hwan Lee, Jaejin Lee, PPoPP’11 [pdf] Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework, Vignesh T. Ravi et al., HPDC 2011  best paper!

  4. Context: clouds shift to support HPC applications • initiallytightly coupled applications not suited for could applications • today • Chinese – cloud with 40Gbps infiniband • Amazaon HPC instance • GPU instances: Amazon, Nimbix • Challenge: make GPUs shared resources in the could.

  5. Challenge: make GPUs a shared resource in the could. • Why do this? • GPUs are costly resources • Multiple VMs on a node with a single GPU • Increase utilization • app level: some apps might not use GPUs much; • kernel level: some kernels can be collocatd

  6. Two streams • How? • Evaluate … • opportunities • gains • overheads

  7. 1. The ‘How?’ • Preamble: Concurrent kernels are supported by today’s GPUs • Each kernel can execute a different task • Tasks can be mapped to different streaming multiprocessors (using thread-block configuration) • Problem: concurrent execution limited to the set of kernels invoked within a single processor context • Past virtualization solutions • API rerouting / intercept library

  8. 1. The ‘How?’ • Preamble: Concurrent kernels are supported by today’s GPUs • Each kernel can execute a different task • Tasks can be mapped to different streaming multiprocessors (using thread-block configuration) • Problem: concurrent execution limited to the set of kernels invoked within a single processor context

  9. 1. The ‘How?’ • Architecture

  10. 2. Evaluation – The opportunity • The opportunity • Key assumption: Under-utilization of GPUs • Space-sharing • Kernels occupy different SP • Time-sharing • Kernels time-share same SP (benefit form harware support form context switces) • Note: is it not always possible

  11. 2. Evaluation – The opportunity • The opportunity • Key assumption: Under-utilization of GPUs • Sharing • Space-sharing • Kernels occupy different SP • Time-sharing • Kernels time-share same SP (benefit form harware support form context switces) • Note: resource conflicts may prevent this • Molding – change kernel configuration (different number of thread blocks / threads per block) to improve collocation

  12. 2. Evaluation – The gains

  13. 2. Evaluation – The overheads

  14. Discussion • Limitations • Hardware support

  15. OpenCL vs. CUDA • http://ft.ornl.gov/doku/shoc/level1 • http://ft.ornl.gov/pubs-archive/shoc.pdf

More Related