1 / 24

Evaluating the Imagine Stream Processor

Evaluating the Imagine Stream Processor. Jung Ho Ahn , William J. Dally, Brucek Khailany , Ujval J. Kapasi , and Abhishek Das ISCA 2004. Motivation. Provide efficiency of an ASIC Provide flexibility of a programmable processor Simplify special-purpose processor design

lelia
Download Presentation

Evaluating the Imagine Stream Processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating the Imagine Stream Processor Jung Ho Ahn, William J. Dally, BrucekKhailany, Ujval J. Kapasi, and Abhishek Das ISCA 2004

  2. Motivation • Provide efficiency of an ASIC • Provide flexibility of a programmable processor • Simplify special-purpose processor design • Lower special-purpose processor design cost • Provide better applicability • Target media applications

  3. Stream Architecture

  4. Development Board PowerPC, 150 MHz 2 x Imagine, 200 MHz FPGA Bridge, 66 MHz 256MB of SDRAM / Imagine, 100 MHz

  5. Applications

  6. Mapping

  7. Execution on a Single Stream Kernel 1 SRF Iteration 1 … Input Stream … … Output Stream … Iteration n … … …

  8. Execution of Multiple Kernels Kernel 1 SRF Stream 1 … processing… … … Stream 2 Kernel 2 … … Stream 3 processing… … … Kernel 3 Stream 4 … … processing… …

  9. Application Performance GOPS: 18% GFLOPS: 60%

  10. Sources of Overhead

  11. Stream Length Effects

  12. Access Pattern Effects

  13. Energy Efficiency Energy consumption per FLOP : (when normalized to 0.13um 1.2V process) Imagine @ 200 MHz: 277pJ/FLOP TI C67x DSP @ 225MHz: 889pJ/FLOP (3.2x more) Intel Pentium M @ 1200GHz: 3600pJ/FLOP (13x more)

  14. Memory Bandwidth Requirement

  15. Host Processor Bandwidth Requirement

  16. Programming Model

  17. Compiler OptimizationsStream Ordering

  18. Compiler OptimizationsSRF Overlapping and Packing

  19. Compiler OptimizationsStrip-mining

  20. Compiler OptimizationsLoop Unrolling and Software Pipelining

  21. Conclusions • Provides performance close to that of ASIC and flexibility via programming • Can sustain between 16% and 60% of the peak arithmetic performance • Exposed 2-level register file allows compiler to exploit locality • Broader applicability • Requires considerable programming effort • Limited to media applications with regular control-flow

  22. Collab Questions • How does the performance compare to other processors? (Dan, Marko, Jason, Prateeksha, Chris) • What is the compiler efficiency? (Mario, Liang) • How were the design decisions motivated? (Jing, Marisabel) • How does the programming model compare to that of GPUs? (Greg)

  23. Kernels

More Related