420 likes | 564 Views
Bridging the Energy Gap in Size, Weight and Power Constrained Software Defined Radio: Agile Baseband Processing as a Key Enabler. Bruno Bougard, Min Li, David Novo, Liesbet Van der Perre and Francky Catthoor. The number of standards to implement in a single handset increases dramatically.
E N D
Bridging the Energy Gap in Size, Weight and Power Constrained Software Defined Radio: Agile Baseband Processing as a Key Enabler Bruno Bougard, Min Li, David Novo, Liesbet Van der Perre and Francky Catthoor
The number of standards to implement in a single handset increases dramatically Mobility Driving UMTS 3G LTE GSMGPRS Walking HSxPA EDGE IEEE802.16e IEEE802.16a,d WLAN (IEEE 802.11a/g/n) DECT Stationary WLAN(IEEE 802.11b) BlueTooth Data rate 0.1 1 10 100 Mbps Bruno Bougard et al. Athens, May 2008
All cost factors direct towards high-volume programmable solutions everywhere possible [source: ICERA] Bruno Bougard et al. Athens, May 2008
Two barriers remain LTE? 3G+ .11n 3G .11g 2G .11b The exploding complexity The energy gap MIMO Bruno Bougard et al. Athens, May 2008
Most SWPC SDR research focuses on more energy efficient processor architectures VLIW/DSPs ASIPs FPGAs ? • NXP onDSP, EVP • Sandbridge Sandblaster SB3011 • SiliconHive CSP2200 • Infineon MUSIC • Icera DXP • Nokia VectorASIP • UMich SODA • ULinkoping/CORESONICS BBE2 • TUDresden SAMIRA • … ASICs Efficiency VLIW/DSPs ASIPs FPGAs VLIW/DSPs ASIPs FPGAs RISCs GPPs Flexibility Bruno Bougard et al. Athens, May 2008
Radio Baseband Platform Requirements Low Cost Long HW lifespan Short SW deployment time Scalable HW/SW Energy aware HW Energy aware algorithms Energy aware protocols Techno-aware power managnt Versatile RX digital front end Versatile TX digital front end Powerful MAC/RLC/QoS Ctrl Energy Aware Spectrum Agile Bruno Bougard et al. Athens, May 2008
Outline • IMEC SDR Baseband Platform • Wanted: Platform Aware Signal Processing • Case study 1: OFDMA transmitter • Case study 2: OFDMA receiver • Case study 3: Dynamic fixed-point format assignment Bruno Bougard et al. Athens, May 2008
Outline • IMEC SDR Baseband Platform • Wanted: Platform Aware Signal Processing • Case study 1: OFDMA transmitter • Case study 2: OFDMA receiver • Case study 3: Dynamic fixed-point format assignment Bruno Bougard et al. Athens, May 2008
Where do you need flexibility?Where do you need energy efficiency? Modulation Demodulation (Inner Modem) Synchronization Diversity/Versatility FE steering Signal detection Forward Error Correction (Outer Modem) Duty Cycle Bruno Bougard et al. Athens, May 2008
Where do you need flexibility?Where do you need energy efficiency? Modulation Demodulation (Inner Modem) Synchronization Need in flexibility FE steering Signal detection Forward Error Correction (Outer Modem) Need in energy efficiency Bruno Bougard et al. Athens, May 2008
IMEC MIMO-capable SDR baseband platform 802.16e and next gen. 802.11n and next gen. 3GPP LTE DVB-H/T • Up to 3 antennas • Up to 200Mbps • <500mW Flexible platform Bruno Bougard et al. Athens, May 2008
Two Programmable CGA Processor Cores at its heart • 32KB I$ • 128KB IMEM • 128-entries CMEM • 64KB L1 data scratchpad • TSMC 90G • Dual VT and substrate biasing for leakage reduction in sleep mode • Clock rate 400MHz WCC • Total Area: 6 sqmm • Power consumption • Active TC VLIW 75mW • Active TC CGA 300mW • Leakage @ T=65C 25mW • Leakage in standby <10mW • 4x4 64-bit 4-way SIMD CGA • VLIW and CGA mode of operations • C-programmable • 25 (theoretical) GOPS • 46MOPS/mW Bruno Bougard et al. Athens, May 2008
200 Mbps+ SDR application driver IEEE 802.11 n digital inner modem receiver Channel bonding 40MHz 2 antennas MIMO SDM OFDM -3 Bruno Bougard et al. Athens, May 2008 15
Profiling for SDR benchmarks and OFDM full application prove real time operations @100Mbps 2-antenna SDM-OFDM @100Mbps Bruno Bougard et al. Athens, May 2008
Great benefit in area but power higher than dedicated hardware solutions SDR (IMEC) 4 400 350 3.5 ASIC (Atheros) 300 ASIC (source: Intel) 3 250 Reconf. (Intel) SDR (IMEC) 200 2.5 150 100 2 50 1.5 0 1 0.5 0 802.11n 802.16e DVB-H 11n&16e all Active Power VLIW: 75mW Active Power CGA: 300mW Leakage Power: 25mW Bruno Bougard et al. Athens, May 2008
The interconnection network dominates the power consumption in VLIW and CGA modes VLIW mode CGA mode Active power: 75mW Leakage Power: 25mW Active power: 300mW Leakage Power: 25mW Bruno Bougard et al. Athens, May 2008
Outline • IMEC SDR Baseband Platform • Wanted: Platform Aware Signal Processing • Case study 1: OFDMA transmitter • Case study 2: OFDMA receiver • Case study 3: Dynamic fixed-point format assignment Bruno Bougard et al. Athens, May 2008
Wanted: SDR-Platform Aware Signal Processing Horse as Platform Elephant as Platform Bruno Bougard et al. Athens, May 2008
Dynamic signal processing implementation Time 3GPP Channel response Cycle Count on SoA processor Bruno Bougard et al. Athens, May 2008
ASIC as platform Requires simple control flow Requires manifest and regular computation structures Maximum functional reuse is a must Minimum data wordwidth Accommodates high computation loads Highest energy efficiency SDR as platform Accommodates more complex control flows Accommodates complex and irregular computation structures Functional reuse not a must (reuse memory footprint only) Aligned data wordwidth Limited maximum computation load Lower energy efficiency Wanted: SDR-Platform Aware Signal Processing Bruno Bougard et al. Athens, May 2008
Algorithm-Architecture Co-Design • Make algorithm compatible with architecture/compiler constraints • Exploit opportunities of programmable architecture Algorithm/Software Architecture/ Compiler Bruno Bougard et al. Athens, May 2008
Observation Channel Channel • Wireless baseband processing implies high dynamics • Wireless baseband processing tolerate inaccuracy • This is already considered at system level (X-layer), but what about in the signal processing implementation? Bruno Bougard et al. Athens, May 2008
The opportunity SDR Baseband with High Structure Complexity Baseband ASIC with Low Structure Complexity • Two viewpoints toward complexity • Computation complexity and memory complexity • Structure complexity (control flow, heterogeneity , etc.) • Wireless system can cope with inaccuracy (“scalable” QoS) • On SDR • Computation complexity is much more costly than in ASIC • Memory complexity is as costly as in ASIC • Structure complexity is much less costly than in ASIC • What can we do ? Increase the structure complexity of baseband processing to reduce the average computation and memory complexity by enabling run-time adaptation of the algorithms implementation to the dynamics in QoS requirement, environment (and platform) Bruno Bougard et al. Athens, May 2008
Outline • IMEC SDR Baseband Platform • Wanted: Platform Aware Signal Processing • Case study 1: OFDMA transmitter • Case study 2: OFDMA receiver • Case study 3: Dynamic fixed-point format assignment Bruno Bougard et al. Athens, May 2008
Motivation: OFDMA Modulation Error requirements vary WiMAX Specification Modulation accuracy can be relaxed for lower order modulation Bruno Bougard et al. Athens, May 2008
RCE relaxation can be exploited by a scalable digital OFDMA Modulator • Original: A large-size (e.g., 1024) IFFT based non-scalable modulator • Transformed: An scalable OFDMA modulator with 3 cascaded components Interpolation factor can be used as a knob to adjust the accuracy and computation load to the RCE requirement Bruno Bougard et al. Athens, May 2008
Computation load scales smoothly with the interpolation factor Normalized cycle count Interpolation factor Bruno Bougard et al. Athens, May 2008
Outline • IMEC SDR Baseband Platform • Wanted: Platform Aware Signal Processing • Case study 1: OFDMA transmitter • Case study 2: OFDMA receiver • Case study 3: adaptive fixed-point refinement Bruno Bougard et al. Athens, May 2008
OFDMA mod./demod. requires (I)FFT with Partial input/output The position and number of bins change dynamically Bruno Bougard et al. Athens, May 2008
Efficient Partial FFT on ILP Architectures • Exploit the partial input/output to reduce active instructions and memory accesses • 30 years theoretical research on PFFT but few implementations • We propose a generic and efficient scheme for PFFT on ILP architectures • Any pattern of bin-distribution can be implemented Bruno Bougard et al. Athens, May 2008
The proposed scheme brings important gains in almost all implementation cost factors and scales smoothly with the number of sub-carriers to be processed Bruno Bougard et al. Athens, May 2008
The prize to pay is an higher instruction cache miss rate (acceptable) Bruno Bougard et al. Athens, May 2008
Outline • IMEC SDR Baseband Platform • Wanted: Platform Aware Signal Processing • Case study 1: OFDMA transmitter • Case study 2: OFDMA receiver • Case study 3: Dynamic fixed-point format assignment Bruno Bougard et al. Athens, May 2008
State-of-the-art Automatic Floating point to fixed point conversion (>30 years of work) Commercial products: Catalytic Inc. & Mathworks Recent academic contributions: Simulation-based: Seoul National Univ. (‘95) Analytical methods: Aachen (‘98), Northwest Univ. (‘01) Hybrid methods: Imperial College (‘03), Berkeley (‘04) and ENSSAT (‘05) • Run-time word-length selection: Receiver VLSI architecture based in a control feedback loop. Hokkaido University (‘06) [Yoshizawa, S. et Al. ISCAS’06] Bruno Bougard et al. Athens, May 2008 37
Modeling of the fixed-point communication system Performance of the communication system as a function of the receiver SNR BER = f(SNR) a + c b A B C D a + na c a nc + nb • Fixed-point refined system includes quantization noise • BER = f(SNR, na, nb, …) = f’(SNR) ≈ f(SNR’) • Implementation-scenarios defined and optimized at design time Bruno Bougard et al. Athens, May 2008 38
Opportunity: application dynamics and tolerance to inaccuracy can be propagated to the implementation A B C D • Multiple link parameters trade off noise/interference robustness versus data rate • Different system configurations have different requirementsin [digital] signal processing accuracy use different implementations SYSTEM LEVEL noise Channel IMPLEMENTATION LEVEL SNR #bits RX TX Analog FE Digital DSP A + • We adapt the application fixed-point mapping at run-time • By switching between the “mappings”, the average load is reduced Bruno Bougard et al. Athens, May 2008
SDR enables more agile signal processing implementations Chan Att Run-time controller QoS req. Monitoring info Adapt Data format Freq Time DSP implementation • Several sw implementation of the same functionality with different precision/computation load • Monotonic relation between precision/load • One can switch between sw implementation in a few cycles Bruno Bougard et al. Athens, May 2008
Dynamic fixed-point format assignment increases energy efficiency in situation requiring lower performance Bruno Bougard et al. Athens, May 2008
Dynamic fixed-point format assignment increases energy efficiency in situation requiring lower performance Bruno Bougard et al. Athens, May 2008
Dynamic fixed-point format assignment increases energy efficiency in situation requiring lower performance Bruno Bougard et al. Athens, May 2008
Increase in scalability Energy efficiency increased at lower rate modes Average energy consumption is reduced Bruno Bougard et al. Athens, May 2008 44
Conclusions Energy efficiency of flexible implementation closer to their dedicated hardware counterparts: Has the potential to continuously best-fit the dynamism. Does not rely on hypothetical provision in the standards: Implementation centric Applicable to any functional-level algorithmic solutions Wireless systems context today but also other domains tomorrow: Digital signal processing with an SNR type constraint and which has dynamic data resolution variation biomedical signal processing, multimedia, etc. Bruno Bougard et al. Athens, May 2008 45