260 likes | 277 Views
SODA: A Low-power (Multi-Core) Architecture For Software Radio. Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, Krisztian Flautner ( Slides modified from original ISCA presentation ). Software Radio: Introduction & Survey.
E N D
SODA: A Low-power (Multi-Core) Architecture For Software Radio Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor Mudge, Chaitali Chakrabarti, Krisztian Flautner (Slides modified from original ISCA presentation) 1
Software Radio: Introduction & Survey • A Radio Communication System • Software for modulation and demodulation of signals • Applications in military and cellular services • Can receive and transmit different protocols (W-CDMA, GSM, CDMA2000, etc.) on the same piece of hardware • Implemented originally on TI’s C40x processors in 1992 • Commercially available in RFID readers today 2
Advantages of Software Defined Radio • Multi-mode operations • Lower costs • Faster time to market • Prototyping and bug fixes • Chip volumes • Longevity of platforms • Protocol complexity favors software dominated solutions • Enables future wireless communication innovations • Cognitive radio 3
Why is SDR Challenging? • SDR Design Objectives for 3G and WiFi • Throughput requirements • 40Gops peak throughput • Power budget • 100mW~500mW peak power 4
The Anatomy of Wireless Protocols 1. Filtering: suppress signals outside frequency band 2. Modulation: map source information onto signal waveforms 3. Channel Estimation: Estimate channel condition for transceivers 4. Error Correction: correct errors induced by noisy channel 6
SODA System Architecture • 4 PEs • static kernel mapping and scheduling • SIMD+Scalar units • 1 ARM GPP controller • scalar algorithms and protocol controls 8
SODA Memory Organization • 2-Level scratchpad memories • 12KB Local scratchpad memory for stream queues • 64KB global scratchpad memory for large buffers • Low-throughput shared bus • 200MHz 32-bit bus • inter-PE communication using DMA 9
SDR Performance Distribution • 802.11a has higher number of total computational cycles • W-CDMA requires higher computational cycles per bit 16
Power Consumption at 180nm • Wide SIMD requires higher number of pipeline registers • 802.11a consumes higher power than W-CDMA • 8-bit W-CDMA computation versus 16-bit 802.11a computation 17
Summary • Key features of SODA • Multi-PE with scratchpad memories • Low throughput shared bus • 2-issue LIW: SIMD+(Scalar or AGU) • 32-wide SIMD processing • SIMD shuffle network 18
Conclusion & Future Work • Conclusion • 2G and 3G SDR solutions are achievable in 90nm • Optimization opportunities at the algorithm, software and hardware levels • Future Work • SDR for Idle mode operation • Compiler for SODA 19
Questions • Is the shuffle network robust enough to adapt to changing protocols? • System compilation is not straightforward for dynamically changing channel? • Requires quite a bit of program analysis for automation • How well does the competition perform? • Is heat an issue now that we want to dissipate more energy in a very short time? 20
Compiler work here • The Trimaran kernel compiler • Performance on par for wireless kernels and with in 3x for speech recognition algorithms • A scheduling algorithm based on the simplex mathod for minimising Energy-Delay product • Pattern matching algorithms help interconnect scheduling for face recognition 21
Different Levels of Software Radio <source:http://www.sdrforum.org> 22
Power Methodology • Our flow sequence was • Design Compiler and Silicon Ensemble • For Initial Floorplan Estimation • Physical Compiler • For placement and Optimization • Silicon Ensemble • Routing • We optimized for power and delay • Blocks like memory were generated with Artisan Memory Generators • We used the Synopsys IP Blocks as much as possible to get better compiled blocks 23
SDR – Application Specific Design • Wireless protocols are systems of DSP algorithms • System-level • Example: Specification of W-CDMA DCH channel • Algorithm-level • Example: Implementation of a 64 point FFT 25
DSP Algorithm Characteristics • 8 to 16-bit precision • Vector operations • long vectors • constant vector size • Static data movement patterns • Scalar operations 26