310 likes | 428 Views
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers. Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang. This work is supported by Nokia, TI, TATP and NSF. Introduction.
E N D
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang This work is supported by Nokia, TI, TATP and NSF
Introduction • A real-time VLSI architecture for channel estimation • Usually neglected, but high computational complexity • Current DSP solutions do not meet real-time • Iterative fixed point algorithm developed • Area-Time tradeoffs presented • Area-Constrained,Time-Constrained, Area-Time efficient
Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Conclusions
Evolution of mobile communications First generation Voice Second/Current generation Voice + Low-rate data (9.6Kbps) Third generation + Voice + High-rate data (2 Mbps/384 Kbps/128 Kbps) + multimedia
Noise +MAI Base Station Reflected Path Direct Path User 1 User 2 Channel estimation
Need for channel estimation • To compensate for unknown fading amplitudes and asynchronous delays. • Detector performance depends on accuracy of channel estimator • Multiuser Channel Estimation • Jointly estimate parameters for all users • Better performance than single user estimates
Computing channel estimates • Computed by sending a training sequence of known bits to the receiver. • When absent, detected bits can be used to update estimates in a decision feedback mode for tracking. • Importance usually neglected • May exceed detector complexity
Baseband signal processing Antenna Multiple Users Detection Decoding Detected Bits Training Tracking Channel estimation Base-Station Receiver
Multiuser Channel Estimation Algorithm • b = {+1, -1} : Training/Tracking bits • r = 8-bit integer (complex) : Received signal • N = spreading gain (typically fixed ,e.g: 32) • K = number of users (variable, <=N) • A = Maximum Likelihood channel estimate
Implementation complexity • Matrix inversions (size 32x32) per window • Unable to meet real-time on DSPs [Asilomar’99] • VLSI fixed-point architectures for matrix inversions • Difficult to design , Finite precision problems • Typically, simpler single-user sliding correlator structures used.
Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Conclusions
Iterative scheme for channel estimation • Bit-streaming : suitable for tracking • Method of gradient descent • Stable convergence behavior • Simple fixed-point VLSI architecture
Comparison of Bit Error Rates (BER) -1 10 -2 BER 10 O(K2N) MF ActMF ML ActML O(K3+K2N) -3 10 4 5 6 7 8 9 10 11 12 Signal to Noise Ratio (SNR) Simulations - Static multipath channel SINR = 0 dB Paths =3 Preamble L =150 Spreading N = 31 Users K = 15
0 10 MF - Static MF - Tracking ML - Static ML - Tracking -1 10 BER -2 10 -3 10 4 5 6 7 8 9 10 11 12 SNR Fading channel with tracking Doppler = 10 Kmph
Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Conclusions
Area-Time Tradeoffs • Design for 32 users (K) and spreading code (N) 32 • Target Data Rate = 128 Kbps (4000 cycles at 500 MHz). • Area-Constrained Architecture : Pico-cells or fewer users • Time-Constrained Architecture : Maximum data rates • Area-Time Efficient Architecture : Real-Time
Tracking Window L Correlation Matrices (Per Bit) Iterate Detected Bits M UX b0 (2K,1) Rbr O(2KN,8) Pilot Bits b(2K,1) A O(4K2N,8) Data Channel Estimate to Detector M UX r0 (N,8) Rbb O(2K2,8) Pilot r(N,8) TIME Task decomposition: channel estimation
Architecture design: auto-correlation • b = {+1,-1} • Multiplication is a XNOR operation • Matrix updated using XNOR gates • Auto-correlation matrix implemented as an UP/DOWN counter(s)
Architecture design: cross-correlation • b = {+1,-1}, r = 8-bit integer vector (complex) • Multiplications reduce to additions/subtractions • Matrix (complex) can be updated with 8-bit adders • Cross-correlation matrix stored as RAM.
Architecture design: channel estimate • A = 8-bit integer matrix (complex) • µ << 1 : Truncated multiplication [Schulte’93] • Matrix-matrix (real-complex) multiplication of integers • Forms the bottleneck (8-bit multipliers) • Concentrate on multiplication for area-time tradeoffs!
b i A Anew Rbb j 8 8 8 1 8 Load Store MUX EN 1 DEMUX 1 MUX Counter 1 U/D 8 8 8 b0 1 MAC Subtract i j 16 8 Rbr 1 8 >> Subtract 1 8 16 Add/ Sub Add/ Sub 1 8 8 1 j j r r0 Area-Constrained Architecture b b0
Time-constrained Architecture K(2K-1)*1 2K*1 M U X b b*bT b0 b0*b0T K(2K-1)*1 Channel Estimate 2K*1 Rbb A 2K*1 2K2*8 2KN*8 MUX Mult Subtract r M U X 2K*1 2KN*8 N*8 2KN*16 >> Rbr Subtract r0 N*8 2KN*8 2KN*16 N*8
b (2K) Array of Counters a b c d a·b a·c a·d b·c b·d c·d Rbb (2K2*8) bbT(K*{2K-1}*1) Auto-correlation Update in Parallel 1 bbT(i,j) U/D# U/D# Array of XNORs Counter Counter Rbb(i,j) Rbb(i,i)
b (2K*1) r (N*8) a b c d b(i) Add/ Sub# 1 Rbr(2KN*8) 8 8 Adder r(j) Rbr(i,j) Cross-Correlation Update in Parallel
2K*1 Counters MUX 2K*1 2K*8 b0*b0T b*bT A Anew Rbb 2K*1 2K*1 1*8 2K*8 2K*8 b b0 DEMUX Mult MUX 2K*1 2K*1 2K*8 MUX 1*16 Subtract r 1*1 1*8 M U X N*8 1*8 Adder >> Subtract r0 1*8 1*8 1*16 N*8 Load Store Rbr Area-Time Efficient Architecture
Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Conclusions
Comparisons • DSPs unable to exploit bit-level parallelism • Inefficient storage of bits • Replacing multiplications by additions/subtractions
Conclusions • Real-Time VLSI architecture for multiuser channel estimation • Iterative fixed-point algorithm developed to avoid matrix inversions • Area-Time Tradeoffs presented • Area-Constrained, Time-Constrained, Area-Time efficient • VLSI architectures exploit bit-level computations and parallelism to meet real-time.