1 / 9

GPUs and Accelerators

GPUs and Accelerators. Jonathan Coens Lawrence Tan Yanlin Li. Outline.  Graphic Processing Units Features Motivation Challenges  Accelerator Methodology Performance Evaluation Discussion  Rigel Methodology Performance Evaluation Discussion  Conclusion.

marius
Download Presentation

GPUs and Accelerators

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li

  2. Outline •  Graphic Processing Units • Features • Motivation • Challenges •  Accelerator • Methodology • Performance Evaluation • Discussion •  Rigel • Methodology • Performance Evaluation • Discussion •  Conclusion

  3.   Graphics Processing Units (GPU) • GPU • Special purpose processors designed to render 3D scenes • In almost every desktop today • Features • Highly parallel processors • Better floating point performance than CPUs • ATI Radeon x1900 - 250 Gflops • Motivation • Use GPUs for general purpose              programming • Challenges • Difficult for programmer to program • Trade off between programmability              and performance GeForce 6600GT (NV43) GPU

  4. Accelerator: Using Data Parallelism to Program GPUs for General Purpose Uses • Methodology • Data Parallelism to program GPU (SIMD) • Parallel Array C# Object  • No aspects of GPU are exposed to the programmer • Programmer only needs to know how to use the Parallel Array • Accelerator takes care of the conversion to pixel shader code • Parallel programs can be represented as DAGs Simplified block diagram for a GPU Expression DAG with shader breaks marked

  5. Accelerator: Using Data Parallelism to Program GPUs for General Purpose Uses Performance Evaluation Performance of Accelerator versus hand coded pixel shader programs on a GeForce 7800 GTX and an ATI x1800. Performance is shown as speedup relative to the C++ version of programs Speedup of Accelator programs on various GPU compared to C++ programs running on a CPU

  6. Rigel: 1024-core Accelerator Specific Architecture • SPMD programming model • Global address space • RISC instruction set • Write-back cache • Cores laid out in clusters of 8, each cluster with local cache • Custom cores (optimized for space / power) Hierarchical Task Queueing • Single queue from programmer's perspective • Architecture handles distributing tasks • Customizable via API • Task granularity • Static vs. dynamic scheduling

  7. Rigel's Performance Fairly Successful • Achieved speedup utilizing all 1024 cores • Hierarchical task structure effectively scaled to 1024 Issues • Cache coherence! • Memory invalidate broadcasts slow system down  • Barrier flags • Task enqueue / dequeue variables •  Not done in hardware... • Lazy-evaluation write-through barriers at cluster level

  8. Improving Rigel • Will the hierarchical task structure continue to scale? If not, when will the boundary be? (Think multiple cache levels but with processor tasks) • How could we implement barriers or queues to avoid contention, but still scale? (Is memory managed cache coherence appropriate?) • Is specialized hardware the way to go (clusters of 8 custom cores), or can this be replaced by general purpose cores?

  9. Generic and Custom Accelerators • Difficult to make generic enough programming interface between programmer and multi-core system • GPUs are limited by SIMD programming model • Specific hardware platforms still have issues for SPMD • Efficiently scaling for more cores is still an issue     How do we solve these issues?

More Related