300 likes | 462 Views
A System Solution for High- Performance, Low Power SDR. Yuan Lin 1 , Hyunseok Lee 1 , Yoav Harel 1 , Mark Woh 1 , Scott Mahlke 1 , Trevor Mudge 1 and Krisztian Flautner 2 1 Advanced Computer Architecture Laboratory University of Michigan 2 ARM, Ltd. SDR Design Challenges:.
E N D
A System Solution for High- Performance, Low Power SDR Yuan Lin1, Hyunseok Lee1, Yoav Harel1, Mark Woh1, Scott Mahlke1, Trevor Mudge1 and Krisztian Flautner2 1Advanced Computer Architecture Laboratory University of Michigan 2ARM, Ltd.
SDR Design Challenges: • Hardware design challenges • High computational throughput (~40 Gops) • Low power consumption (~200mW) • Meet real-time requirements • DSP programming support • System-level development • Inter-algorithm communication • Algorithm-level development • Efficient DSP representations Advanced Computer Architecture Laboratory University of Michigan
W-CDMA Protocol: 2Mbps Advanced Computer Architecture Laboratory University of Michigan
W-CDMA Characteristics • Plenty of vector parallelism • 8 & 16-bit DSP algorithms • Multiplication is not dominant • No floating-point operation • Small instruction/data memory • Has periodic real-time tasks Advanced Computer Architecture Laboratory University of Michigan
802.11a Protocol: 24Mbps Advanced Computer Architecture Laboratory University of Michigan
802.11a Characteristics • Similar to W-CDMA • Plenty of vector parallelism • No floating-point operation • Small instruction/data memory • Different from W-CDMA • Mostly 16-bit DSP algorithms • Multiplication is more dominant • No periodic real-time tasks Advanced Computer Architecture Laboratory University of Michigan
Amortized fetch Intra-processor Communication Inter-processor Communication System Architecture Design Tradeoffs Advanced Computer Architecture Laboratory University of Michigan
System Architecture Design Tradeoffs Number of Processing Elements x SIMD width For W-CDMA 2Mbps (51.2GOP/sec) 90nm 1V @400MHz Advanced Computer Architecture Laboratory University of Michigan
System Architecture Design Advanced Computer Architecture Laboratory University of Michigan
PE Design (Area < 1mm2 Power<50mW) Advanced Computer Architecture Laboratory University of Michigan
Mapping DSP Algorithms: Filters z-1 In b Out spread Vin, Sin z-1 shift z, z, up mac z,Vin, Sin In b3 b2 b1 b0 Out Z-1 Z-1 Z-1 Advanced Computer Architecture Laboratory University of Michigan
Mapping DSP Algorithm: Filter spread Vin, Sin shift z, z, up mac z,Vin, Sin Advanced Computer Architecture Laboratory University of Michigan
Efficient Design • Wide SIMD width • Small register file with minimum ports • Small memories • Narrow system BUS • Data-path optimized for 8bits • Vector shuffle reduce memory ports Advanced Computer Architecture Laboratory University of Michigan
Advanced Computer Architecture Laboratory University of Michigan
Advanced Computer Architecture Laboratory University of Michigan
802.11a PE Mapping Advanced Computer Architecture Laboratory University of Michigan
Power Results • Configuration • 4 PEs, 1 ARM (Cortex M3) controller • Global scratchpad memory (64Kb) • 90nm (1V @ 400 MHZ), • Synthesized conservatively Advanced Computer Architecture Laboratory University of Michigan
Area Results Advanced Computer Architecture Laboratory University of Michigan
Software Development Flow Advanced Computer Architecture Laboratory University of Michigan
SPEX (Signal Processing EXtension) • Implemented as a library extension to C • System-level development • Support concurrent DSP kernel function definitions • Channel variables for inter-kernel communications • Algorithm-level development • Native vector & matrix variables • Explicit DSP variable attribute definition • Native vector & matrix operations Advanced Computer Architecture Laboratory University of Michigan
SPEX Overview Advanced Computer Architecture Laboratory University of Michigan
SPEX Example Code: Viterbi ACS Concurrent DSP kernel definitions void* acs(void*) { /* variable declaration */ saturated char<64> metrics1, metrics2; saturated char<64> states; saturated char<64> t1, t2; while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive(); /* add */ metrics1 += states; metrics2 += states; /* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2; /* sending data to TB */ acs_to_tb.send(states); } } Advanced Computer Architecture Laboratory University of Michigan
SPEX Example Code: Viterbi ACS void* acs(void*) { /* variable declaration */ saturated char<64> metrics1, metrics2; saturated char<64> states; saturated char<64> t1, t2; while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive(); /* add */ metrics1 += states; metrics2 += states; /* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2; /* sending data to TB */ acs_to_tb.send(states); } } Native SIMD variable definition with explicit attributes SPEX variable supports 1. saturated/overflow 2. various variable bit-width 3. vector & matrices Advanced Computer Architecture Laboratory University of Michigan
SPEX Example Code: Viterbi ACS void* acs(void*) { /* variable declaration */ saturated char<64> metrics1, metrics2; saturated char<64> states; saturated char<64> t1, t2; while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive(); /* add */ metrics1 += states; metrics2 += states; /* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2; /* sending data to TB */ acs_to_tb.send(states); } } Inter-kernel communication through channel operations Channel types: 1. FIFO queue 2. Broadcast queue 3. Sync/control channel 4. Random-read FIFO queue Advanced Computer Architecture Laboratory University of Michigan
SPEX Example Code: Viterbi ACS void* acs(void*) { /* variable declaration */ saturated char<64> metrics1, metrics2; saturated char<64> states; saturated char<64> t1, t2; while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive(); /* add */ metrics1 += states; metrics2 += states; /* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2; /* sending data to TB */ acs_to_tb.send(states); } } • SPEX vector operations • Supports • (Matlab-like C code) • SIMD arithmetic • operations • 2. SIMD permutation • 3. SIMD predication Advanced Computer Architecture Laboratory University of Michigan
Summary • Hardware & software solutions for SDR • Hardware • 4 dual-issue asymmetric SIMD processing elements • Consumes 200~300mW for 90nm • Meets the performance requirements for WCDMA & 802.11a • Software • SPEX provides efficient DSP algorithm and system implementation Advanced Computer Architecture Laboratory University of Michigan
Questions? Advanced Computer Architecture Laboratory University of Michigan