200 likes | 433 Views
Video on DSP and FPGA. John Johansson April 12, 2004. Agenda. Overview of video processing A typical video encoder and the DCT Requirements of DCT Comparison of DSP and FPGA chips Analysis and conclusions Questions. Overview of Video Processing. Video processing generally involves
E N D
Video on DSP and FPGA John Johansson April 12, 2004
Agenda • Overview of video processing • A typical video encoder and the DCT • Requirements of DCT • Comparison of DSP and FPGA chips • Analysis and conclusions • Questions
Overview of Video Processing Video processing generally involves • Compression / Decompression • Special Effects • TV Broadcasting • Focus on Compression
Video Encoding Typical Video Encoder • Focus on DCT algorithm
The Discrete Cosine Transformation • DCT is a spatial transform, like the FFT • Rearranges data into a more compressible format • Typically done on 64 (8x8) pixels at a time • Big nasty equation … • … But no sharp teeth (optimizes extremely well)
Requirements for DCT Basic Idea • Read in data (64 values, 8-24 bits signed / unsigned) • Do transformation • Write out data • Profit !!! • Easy, right ??
Requirements for DCT Memory Limitations • Load an entire frame? • One frame can vary from 50K to 50 MB in size when uncompressed • External memory is much slower, more plentiful • Do the DCT in chunks (8x8 block)
Requirements for DCT Degree of Parallelism • DCT can be done serially, or broken up and done in parallel • Parallelism depends largely on available memory • Price / Performance tradeoffs
The Challengers Xilinx Spartan-3 FPGA • 50K – 5M gates • 326 MHz • 100 KB – 2.3 MB internal memory • 4 - 104 dedicated multipliers • Oodles of I/O pins (up to 784) Look at XC3S1000 • 1M gates, 560 KB memory, 24 multipliers, 376 I/O pins
The Challengers ADSP-BF5xx Blackfin Processor • 200 – 750 MHz • Single or dual core • DMA memory controller • 52 KB – 326 KB internal memory • Other processor goodies Look at ADSP-BF533 • 500 MHz, single core, 148 KB memory
Performance How do we correctly benchmark an algorithm between two completely different processors? • I don’t really know • Look at some rough performance indicators and try and draw a conclusion
Performance FPGA • Varies from 1-25 cycle(s) / pixel for DCT • Reading and writing of data takes additional time • Clock speed limited by degree of parallelism DSP • Roughly 5 cycles / pixel for DCT • DMA controller allows parallel reading and writing with some setup overhead
(Ideal) Performance Spartan-3 • 64 read + 64 compute + 64 write = 196 cycles / block • 326 MHz = 1.66 Mblocks / second Blackfin • 319 compute + 10 DMA transfer = 329 cycles / block • 500 MHz = 1.52 Mblocks / second
Advantages FPGA • Potential for very high parallelism • Existing video designs available for purchase • Good middleman functionality DSP • Higher potential clock speed • Much more flexible design • DMA memory controller
Disadvantages FPGA • Low flexibility • Hard to optimize • Limited logic blocks DSP • Difficult to achieve full utilization • Higher power consumption
Conclusions FPGA • Best for well defined roles, like DCT • Faster in situations where throughput matters • Can be very expensive DSP • Better off for more flexible roles, like full encoder • Situations where large amounts of (additional) memory are needed
References Xilinx Spartan III http://www.xilinx.com/xlnx/xil_prodcat_landingpage.jsp?title=Spartan-3 Analog Devices Blackfin http://www.analog.com/processors/processors/blackfin/index.html
References Other articles http://www.xilinx.com/publications/products/services/xc_pdf/xc_videoapps44.pdf http://www.xilinx.com/publications/products/sp2e/xc_dspvid43.htm http://www.reed-ectronics.com/ednmag/article/CA336860?stt=000&pubdate=11%2F27%25