220 likes | 465 Views
The Imagine Stream Processor. Ujval J. Kapasi, William J. Dally, Scott Rixner, John D. Owens, and Brucek Khailany. Presenter: Lu Hao. Contents. Stream processor Imagine Architecture Example: FFT application Experimental result Conclusion. Motivation of stream processor.
E N D
The Imagine Stream Processor Ujval J. Kapasi, William J. Dally, Scott Rixner, John D. Owens, and Brucek Khailany Presenter: Lu Hao
Contents • Stream processor • Imagine Architecture • Example: FFT application • Experimental result • Conclusion
Motivation of stream processor • Media-processing applications, such as 3-D polygon rendering, MPEG-2 encoding are becoming an increasingly dominant portion of computing workloads today • Properties of media-processing applications • Real-time performance constraints • High arithmetic intensity require parallel solutions • Inherently contain a large amount of data-parallelism • Providing large numbers of ALUs to operate on data in parallel is relatively inexpensive • Current programmable solutions cannot scale to support this many ALUs • Both providing instructions and transferring data at the necessary rates are problematic. • For example, a 48 ALU single-chip processor must issue up to 48 instructions/cycle and provide up to 144 words/cycle of data bandwidth to operate at peak rate.
What is a stream processor • Usually SIMD • Allows some applications to more easily exploit a limited form of parallel processing • Using the stream programming model to expose parallelism as well as producer-consumer locality • can use multiple computational units
The Imagine Processor • Imagine is a programmable stream processor and is a hardware implementation of the stream model. • Imagine is designed to be a stream coprocessor for a general purpose processor that acts as the host. • The programming model organizes the computation in an application into a sequence of arithmetic kernels, and organizes the data-flow into a series of data streams. • On a variety of realistic applications, Imagine can sustain up to 50 instructions per cycle, and up to 15 GOPS of arithmetic bandwidth. • Load-store architecture for streams (SRF)
Contents • Stream processor • Imagine Architecture • Example: FFT application • Experimental result • Conclusion
Architecture of Imagine • 32 KW streamregister file (SRF) • The microcontroller keeps track of the program counter as it broadcasts each VLIW instruction to all eight clusters in a SIMD manner. • Each ALU cluster: six ALUs and 304 registers in several local register files (LRFs).
Architecture of Imagine The SRF
The SRF • Clusters <---> SRF: data that needs to be passed from kernel to kernel • SRF <---> DRAM: part of truly global data structures • All stream operands originate in the SRF and stream results are stored back to the SRF.
Architecture of Imagine The ALU cluster
The ALU cluster 256 x 32-bit register file
Contents • Stream processor • Imagine Architecture • Example: FFT application • Experimental result • Conclusion
Example: mapping of a 1024-point radix-2 FFT to the stream model
Contents • Stream processor • Imagine Architecture • Example: FFT application • Experimental result • Conclusion
Experimental Result • Speedup of 8 clusters over 1 cluster
Contents • Stream processor • Imagine Architecture • Example: FFT application • Experimental result • Conclusion
Conclusion • Stream processors are suitable for media-processing applications • Imagine exploits the data-level parallelism (DLP) in streams by executing a kernel on eight successive stream elements in parallel (one on each cluster). • SRF • ALU clusters • Application example: 1024pt FFT
Thanks! • Questions?