Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang This work is supported by Nokia, TI, TATP and NSF

Introduction • Real-time VLSI architecture for multiuser channel estimation • Multiuser channel estimation usually neglected • high computational complexity - DSPs infeasible • Single user sliding correlator structures used • Iterative fixed point algorithm developed • Area-Time tradeoffs presented • Area-Constrained,Time-Constrained, Area-Time efficient

Baseband signal processing Antenna Multiple Users Detection Decoding Detected Bits Training Tracking Channel estimation Base-Station Receiver

Noise +MAI Base Station Reflected Path Direct Path User 1 User 2 Channel estimation • compensate for unknown fading amplitudes and asynchronous delays.

Need for multiuser channel estimation • Detector performance depends on accuracy of channel estimator • Multiuser Channel Estimation • Jointly estimate parameters for all users • Better performance than single user estimates

Computing multiuser channel estimates • Computed by sending a training sequence of known bits to the receiver. • When absent, detected bits can be used to update estimates in a decision feedback mode for tracking. • Importance of multiuser estimation usually neglected • May exceed detector complexity

Multiuser Channel Estimation Algorithm • = {+1, -1} : Training/Tracking bits • = 8-bit integer (complex) : Received signal • N = spreading gain (typically fixed ,e.g: 32) • K = number of users (variable, <=N) • = Maximum Likelihood channel estimate

Implementation complexity • Matrix inversions (size 32x32) per window • Unable to meet real-time on DSPs [Asilomar’99] • VLSI fixed-point architectures for matrix inversions • Difficult to design , Finite precision problems • Typically, simpler single-user sliding correlator structures used.

Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Conclusions

Iterative scheme for channel estimation • Bit-streaming : suitable for tracking (window length L) • Method of gradient descent • Stable convergence behavior • Simple fixed-point VLSI architecture

Comparison of Bit Error Rates (BER) -1 10 -2 BER 10 O(K2N) MF ActMF ML ActML O(K3+K2N) -3 10 4 5 6 7 8 9 10 11 12 Signal to Noise Ratio (SNR) Simulations - Static multipath channel SINR = 0 dB Paths =3 Preamble =150 Spreading N = 31 Users K = 15

0 10 MF - Static MF - Tracking ML - Static ML - Tracking -1 10 BER -2 10 -3 10 4 5 6 7 8 9 10 11 12 SNR Rayleigh Fading channel with tracking Doppler = 10 Kmph

Area-Time Tradeoffs • Design for 32 users (K) and spreading code (N) 32 • Target = 128 Kbps (4000 cycles at 500 MHz). • Assume single cycle addition/multiplication • Area-Constrained Architecture :Pico-cells/fewer users • Time-Constrained Architecture : Maximum data rates • Area-Time Efficient Architecture : Real-Time

Tracking Window L Correlation Matrices (Per Bit) Iterate Detected Bits M UX b0 (2K,1) Rbr O(2KN,8) Pilot Bits b(2K,1) A O(4K2N,8) Data M UX Channel Estimate to Detector r0 (N,8) Rbb O(2K2,8) Pilot r(N,8) TIME Task decomposition: channel estimation

Architecture design: auto-correlation • b = {+1,-1} • Multiplication is a XNOR operation • Matrix updated using XNOR gates • Auto-correlation matrix implemented as an UP/DOWN counter(s)

Architecture design: cross-correlation • b = {+1,-1}, r = 8-bit integer vector (complex) • Multiplications reduce to additions/subtractions • Matrix (complex) can be updated with 8-bit adders • Cross-correlation matrix stored as RAM.

Architecture design: channel estimate • A = 8-bit integer matrix (complex) • µ << 1 : Truncated multiplication [Schulte’93] • Matrix-matrix (real-complex) multiplication of integers • Forms the bottleneck (8-bit multipliers) • Concentrate on multiplication for area-time tradeoffs!

b i A(i) A(i-1) Rbb j 8 8 8 1 8 Load Store MUX EN 1 DEMUX 1 MUX Counter 1 U/D 8 8 8 b0 1 MAC Subtract i j 16 8 Rbr 1 8 >> Subtract 1 8 16 Add/ Sub Add/ Sub 1 8 8 1 j j r r0 Area-Constrained Architecture Channel Estimate b b0

Area-constrained Architecture: Hardware Requirements

Time-constrained Architecture K(2K-1)*1 2K*1 M U X b b*bT b0 b0*b0T K(2K-1)*1 Channel Estimate 2K*1 Rbb A 2K*1 2K2*8 2KN*8 MUX Mult Subtract r M U X 2K*1 2KN*8 N*8 2KN*16 >> Rbr Subtract r0 N*8 2KN*8 2KN*16 N*8

Auto-correlation Update in Parallel 1 bbT(i,j) b (2K) U/D# U/D# Counter Counter a b c d Rbb(i,j) Rbb(i,i) a·b a·c a·d b·c b·d c·d bbT(K*{2K-1}*1) Rbb (2K2*8) Array of XNORs Array of Counters

b (2K*1) a b c d r (N*8) b(i) Add/ Sub# 1 8 8 Adder Rbr(i,j) Rbr(2KN*8) Cross-Correlation Update in Parallel r(j)

Time-constrained Architecture: Hardware Requirements

Area-Time efficient architecture design • Area - constrained Architecture • Minimize area - single 8-bit multiplier • 4K2N cycles (128,000 cycles ; 3.81 Kbps) • Time-constrained Architecture • Minimize time - 4K2N 8-bit multipliers • Log2(2K) cycles (6 cycles ; 83.33 Mbps) • Aim : To meet real-time with min. area overhead • Different parallelism levels for multipliers

2K*1 Counters MUX 2K*1 2K*8 b0*b0T b*bT A(i) A(i-1) Rbb 2K*1 2K*1 1*8 2K*8 2K*8 b b0 DEMUX Mult MUX 2K*1 2K*1 2K*8 MUX 1*16 Subtract r 1*1 1*8 M U X N*8 1*8 Adder >> Subtract r0 1*8 1*8 1*16 N*8 Load Store Rbr Area-Time Efficient Architecture Channel Estimate

Area-Time Efficient Architecture: Hardware Requirements

Comparisons • DSPs unable to exploit bit-level parallelism • Inefficient storage of bits • Replacing multiplications by additions/subtractions

Scalability of Architectures with K • Disadvantages of VLSI architectures • Design for maximum number of users in the system • If there are fewer users, • Turn off functional units to reduce power • Reconfigure hardware for higher data rates (FPGA) • Dr. Cavallaro, don’t know to handle this Question properly • We never designed an architecture/algorithm for varying number of users dynamically. (Though we had started on it) • What should be included in future work? • Please give suggestions!!

Conclusions • Real-Time VLSI architecture for multiuser channel estimation • Iterative fixed-point algorithm developed to avoid matrix inversions • Area-Time Tradeoffs presented • Area-Constrained, Time-Constrained, Area-Time efficient • VLSI architectures exploit bit-level computations and parallelism to meet real-time.

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Presentation Transcript

Mixed Signal VLSI

VLSI Architectures 048878

Statistical Signal Processing for Sensing and Mitigating Impulsive Noise in Communication Receivers

Unified Architectures for Efficient and Compact Crypto-Processing

Base Station

Base Station Association Game in Multi-cell Wireless Network

ECE 734 VLSI Array Structures for Digital Signal Processing

Testbed for Wireless Adaptive Signal Processing Systems

Baseband Architecture Design for Future Wireless Base-Station Receivers

ELEC692 VLSI Signal Processing Architecture Lecture 7

Signal Processing for Wireless Communications: Design, Tools, Architectures

Optimal Base Station Selection for Anycast Routing in Wireless Sensor Networks

Base Station Antenna Considerations in Wireless Network Deployment

Security Mechanism for Home Base Station in Wireless Residential Networks

Advanced Topics in Signal Processing for Wireless Communications

SWAPs: Re-thinking mobile and base-station architectures

Architectures for Baseband Processing in Future Wireless Base-Station Receivers

VLSI Signal Processing

4G Base Station To 5G Base Station

Mapping Signal Processing Kernels to Tiled Architectures

VL7101 VLSI SIGNAL PROCESSING

VLSI SIGNAL PROCESSING