330 likes | 481 Views
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers. Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang. This work is supported by Nokia, TI, TATP and NSF. Introduction.
E N D
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang This work is supported by Nokia, TI, TATP and NSF
Introduction • A real-time VLSI architecture for channel estimation • Usually neglected, but high computational complexity • Current DSP solutions do not meet real-time • Iterative fixed point algorithm developed • Area-Time Tradeoffs discussed • Area-Constrained (Pico-cells) • Time-Constrained (Theoretical Data Rates) • Area-Time efficient (Real-Time Solution)
Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Comparisons with DSP solutions • Related Work and Conclusions
Evolution of mobile communications First generation Voice Second/Current generation Voice + Low-rate data (9.6Kbps) Third generation + Voice + High-rate data (2 Mbps/384 Kbps/128 Kbps) + multimedia
Direct Path Channel estimation Noise +MAI Base Station Reflected Path User 1 User 2
Need for channel estimation • To compensate for unknown fading amplitudes and asynchronous delays. • Detector performance depends on accuracy of channel estimator
Computing channel estimates • Computed by sending a training sequence of known bits to the receiver. • When absent, detected bits can be used to update estimates in a decision feedback mode for tracking. • Importance usually neglected • May exceed detector complexity
Baseband signal processing Antenna Multiple Users Detection Decoding Detected Bits Training Tracking Channel estimation Base-Station Receiver
Implementation complexity • Matrix inversions (size 32x32) per window • Unable to meet real-time on DSPs [Asilomar’99] • VLSI fixed-point architectures for matrix inversions • Precision problems • Typically, simpler single-user sliding correlator structures used.
Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Comparisons with DSP solutions • Related Work and Conclusions
Iterative scheme for channel estimation • Method of Gradient Descent • Stable convergence behavior • Same Performance • Simpler Bit-Streaming Hardware Implementation
Comparison of Bit Error Rates (BER) -1 10 -2 BER 10 O(K2N) MF ActMF ML ActML O(K3+K2N) -3 10 4 5 6 7 8 9 10 11 12 Signal to Noise Ratio (SNR) Simulations - Static multipath channel SINR = 0 Paths =3 Preamble L =150 Spreading N = 31 Users K = 15
0 10 MF - Static MF - Tracking ML - Static ML - Tracking -1 10 BER -2 10 -3 10 4 5 6 7 8 9 10 11 12 SNR Fading channel with tracking Doppler = 10 Kmph
Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Comparisons with DSP solutions • Related Work and Conclusions
Area-Time Tradeoffs • Design for 32 users (K) and spreading code (N) 32 • Target Data Rate = 128 Kbps • Low Power Issues ignored! • Area-Constrained Architecture • Pico-cells ; lower data rates • Time-Constrained Architecture • Maximum achieve-able data rates • Area-Time Efficient Architecture • Real-Time with minimum area overhead
Tracking Window L Correlation Matrices (Per Bit) Iterate Detected Bits M UX b0 (2K,1) Rbr O(2KN,8) Pilot Bits b(2K,1) A O(4K2N,8) Data Channel Estimate to Detector M UX r0 (N,8) Rbb O(2K2,8) Pilot r(N,8) TIME Task Decomposition
Architecture Design: Auto-correlation • b = {+1,-1} • Multiplication is a XNOR operation • Entire matrix can be updated sequentially or in parallel using XNOR gates • Auto-correlation matrix implemented as an UP/DOWN counter(s)
Architecture Design: Cross-Correlation • b = {+1,-1}, r = 8-bit integer vector (complex) • Multiplications reduce to additions/subtractions • Entire matrix (complex) can be updated sequentially or in parallel using 8-bit adders • Cross-correlation matrix stored as RAM.
Architecture Design: Channel Estimate • A = 8-bit integer matrix (complex) • µ << 1 : Truncated Multiplication [Schulte’93] • Matrix-matrix (real-complex) multiplication of integers • Forms the bottleneck • Can be done sequentially with a single multiplier or totally parallel or partially parallel • Concentrate on multiplication for area-time tradeoffs!
b i A Anew Rbb j 8 8 8 1 8 Load Store MUX EN 1 DEMUX 1 MUX Counter 1 U/D 8 8 8 b0 1 MAC Subtract i j 16 8 Rbr 1 8 >> Subtract 1 8 16 Add/ Sub Add/ Sub 1 8 8 1 j j r r0 Area-Constrained Architecture b b0
Time-constrained Architecture K(2K-1)*1 2K*1 M U X b b*bT b0 b0*b0T K(2K-1)*1 Channel Estimate 2K*1 Rbb A 2K*1 2K2*8 2KN*8 MUX Mult Subtract r M U X 2K*1 2KN*8 N*8 2KN*16 >> Rbr Subtract r0 N*8 2KN*8 2KN*16 N*8
b (2K) Array of Counters a b c d a·b a·c a·d b·c b·d c·d Rbb (2K2*8) bbT(K*{2K-1}*1) Auto-correlation Update in Parallel 1 bbT(i,j) U/D# U/D# Array of XNORs Counter Counter Rbb(i,j) Rbb(i,i)
b (2K*1) r (N*8) a b c d b(i) Add/ Sub# 1 Rbr(2KN*8) 8 8 Adder r(j) Rbr(i,j) Cross-Correlation Update in Parallel
2K*1 Counters MUX 2K*1 2K*8 b0*b0T b*bT A Anew Rbb 2K*1 2K*1 1*8 2K*8 2K*8 b b0 DEMUX Mult MUX 2K*1 2K*1 2K*8 MUX 1*16 Subtract r 1*1 1*8 M U X N*8 1*8 Adder >> Subtract r0 1*8 1*8 1*16 N*8 Load Store Rbr Area-Time Efficient Architecture
Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Comparisons with DSP solutions • Related Work and Conclusions
DSP Comparisons • DSPs unable to exploit bit-level parallelism • Inefficient storage of bits • Replacing multiplications by additions/subtractions
64-bit Register D[i][j] 8 8 +/- +/- 8 8-bit Control Register b[i] 64-bit Register D[i][j] Related Work: DSP Extensions (Cross-Correlation) For i = 1..8, j= 1..8 D[i][j] = D[i][j] + b[i]*C[j]
Related Work: Online Arithmetic • Multiuser Detection • Need to compute only the Sign Bit (Most Significant Digit ) • No back-conversion to conventional representation • complex-number representation possible • Integration with channel estimation also.
Related Work : DSP-FPGA solutions • Multiple DSP-FPGA task partitioning • Bit level parallelism on FPGAs • Multiplications on DSPs. • Sundance Multi-DSP System • 2 TI C67 DSPs • 2 Xilinx Virtex FPGAs • http://www.sundance.com
Conclusions • Real-Time VLSI architecture for multiuser channel estimation • Iterative fixed-point algorithm developed to avoid matrix inversions • Area-Time Tradeoffs discussed • Area-Constrained (Pico-cells) • Time-Constrained (Data Rates) • Area-Time efficient (Real-Time) • VLSI architectures better exploit bit-level computations and parallelism to meet real-time constraints than DSPs.