310 likes | 413 Views
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers. Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang. This work is supported by Nokia, TI, TATP and NSF. Introduction.
E N D
Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang This work is supported by Nokia, TI, TATP and NSF
Introduction • Real-time VLSI architecture for multiuser channel estimation • Multiuser channel estimation usually neglected • high computational complexity - DSPs infeasible • Single user sliding correlator structures used • Iterative fixed point algorithm developed • Area-Time tradeoffs presented • Area-Constrained,Time-Constrained, Area-Time efficient
Baseband signal processing Antenna Multiple Users Detection Decoding Detected Bits Training Tracking Channel estimation Base-Station Receiver
Noise +MAI Base Station Reflected Path Direct Path User 1 User 2 Channel estimation • compensate for unknown fading amplitudes and asynchronous delays.
Need for multiuser channel estimation • Detector performance depends on accuracy of channel estimator • Multiuser Channel Estimation • Jointly estimate parameters for all users • Better performance than single user estimates
Computing multiuser channel estimates • Computed by sending a training sequence of known bits to the receiver. • When absent, detected bits can be used to update estimates in a decision feedback mode for tracking. • Importance of multiuser estimation usually neglected • May exceed detector complexity
Multiuser Channel Estimation Algorithm • = {+1, -1} : Training/Tracking bits • = 8-bit integer (complex) : Received signal • N = spreading gain (typically fixed ,e.g: 32) • K = number of users (variable, <=N) • = Maximum Likelihood channel estimate
Implementation complexity • Matrix inversions (size 32x32) per window • Unable to meet real-time on DSPs [Asilomar’99] • VLSI fixed-point architectures for matrix inversions • Difficult to design , Finite precision problems • Typically, simpler single-user sliding correlator structures used.
Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Conclusions
Iterative scheme for channel estimation • Bit-streaming : suitable for tracking (window length L) • Method of gradient descent • Stable convergence behavior • Simple fixed-point VLSI architecture
Comparison of Bit Error Rates (BER) -1 10 -2 BER 10 O(K2N) MF ActMF ML ActML O(K3+K2N) -3 10 4 5 6 7 8 9 10 11 12 Signal to Noise Ratio (SNR) Simulations - Static multipath channel SINR = 0 dB Paths =3 Preamble =150 Spreading N = 31 Users K = 15
0 10 MF - Static MF - Tracking ML - Static ML - Tracking -1 10 BER -2 10 -3 10 4 5 6 7 8 9 10 11 12 SNR Rayleigh Fading channel with tracking Doppler = 10 Kmph
Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Conclusions
Area-Time Tradeoffs • Design for 32 users (K) and spreading code (N) 32 • Target = 128 Kbps (4000 cycles at 500 MHz). • Assume single cycle addition/multiplication • Area-Constrained Architecture :Pico-cells/fewer users • Time-Constrained Architecture : Maximum data rates • Area-Time Efficient Architecture : Real-Time
Tracking Window L Correlation Matrices (Per Bit) Iterate Detected Bits M UX b0 (2K,1) Rbr O(2KN,8) Pilot Bits b(2K,1) A O(4K2N,8) Data M UX Channel Estimate to Detector r0 (N,8) Rbb O(2K2,8) Pilot r(N,8) TIME Task decomposition: channel estimation
Architecture design: auto-correlation • b = {+1,-1} • Multiplication is a XNOR operation • Matrix updated using XNOR gates • Auto-correlation matrix implemented as an UP/DOWN counter(s)
Architecture design: cross-correlation • b = {+1,-1}, r = 8-bit integer vector (complex) • Multiplications reduce to additions/subtractions • Matrix (complex) can be updated with 8-bit adders • Cross-correlation matrix stored as RAM.
Architecture design: channel estimate • A = 8-bit integer matrix (complex) • µ << 1 : Truncated multiplication [Schulte’93] • Matrix-matrix (real-complex) multiplication of integers • Forms the bottleneck (8-bit multipliers) • Concentrate on multiplication for area-time tradeoffs!
b i A(i) A(i-1) Rbb j 8 8 8 1 8 Load Store MUX EN 1 DEMUX 1 MUX Counter 1 U/D 8 8 8 b0 1 MAC Subtract i j 16 8 Rbr 1 8 >> Subtract 1 8 16 Add/ Sub Add/ Sub 1 8 8 1 j j r r0 Area-Constrained Architecture Channel Estimate b b0
Time-constrained Architecture K(2K-1)*1 2K*1 M U X b b*bT b0 b0*b0T K(2K-1)*1 Channel Estimate 2K*1 Rbb A 2K*1 2K2*8 2KN*8 MUX Mult Subtract r M U X 2K*1 2KN*8 N*8 2KN*16 >> Rbr Subtract r0 N*8 2KN*8 2KN*16 N*8
Auto-correlation Update in Parallel 1 bbT(i,j) b (2K) U/D# U/D# Counter Counter a b c d Rbb(i,j) Rbb(i,i) a·b a·c a·d b·c b·d c·d bbT(K*{2K-1}*1) Rbb (2K2*8) Array of XNORs Array of Counters
b (2K*1) a b c d r (N*8) b(i) Add/ Sub# 1 8 8 Adder Rbr(i,j) Rbr(2KN*8) Cross-Correlation Update in Parallel r(j)
Area-Time efficient architecture design • Area - constrained Architecture • Minimize area - single 8-bit multiplier • 4K2N cycles (128,000 cycles ; 3.81 Kbps) • Time-constrained Architecture • Minimize time - 4K2N 8-bit multipliers • Log2(2K) cycles (6 cycles ; 83.33 Mbps) • Aim : To meet real-time with min. area overhead • Different parallelism levels for multipliers
2K*1 Counters MUX 2K*1 2K*8 b0*b0T b*bT A(i) A(i-1) Rbb 2K*1 2K*1 1*8 2K*8 2K*8 b b0 DEMUX Mult MUX 2K*1 2K*1 2K*8 MUX 1*16 Subtract r 1*1 1*8 M U X N*8 1*8 Adder >> Subtract r0 1*8 1*8 1*16 N*8 Load Store Rbr Area-Time Efficient Architecture Channel Estimate
Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Conclusions
Comparisons • DSPs unable to exploit bit-level parallelism • Inefficient storage of bits • Replacing multiplications by additions/subtractions
Scalability of Architectures with K • Disadvantages of VLSI architectures • Design for maximum number of users in the system • If there are fewer users, • Turn off functional units to reduce power • Reconfigure hardware for higher data rates (FPGA) • Dr. Cavallaro, don’t know to handle this Question properly • We never designed an architecture/algorithm for varying number of users dynamically. (Though we had started on it) • What should be included in future work? • Please give suggestions!!
Conclusions • Real-Time VLSI architecture for multiuser channel estimation • Iterative fixed-point algorithm developed to avoid matrix inversions • Area-Time Tradeoffs presented • Area-Constrained, Time-Constrained, Area-Time efficient • VLSI architectures exploit bit-level computations and parallelism to meet real-time.