1 / 31

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers. Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang. This work is supported by Nokia, TI, TATP and NSF. Introduction.

huy
Download Presentation

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang This work is supported by Nokia, TI, TATP and NSF

  2. Introduction • Real-time VLSI architecture for multiuser channel estimation • Multiuser channel estimation usually neglected • high computational complexity - DSPs infeasible • Single user sliding correlator structures used • Iterative fixed point algorithm developed • Area-Time tradeoffs presented • Area-Constrained,Time-Constrained, Area-Time efficient

  3. Baseband signal processing Antenna Multiple Users Detection Decoding Detected Bits Training Tracking Channel estimation Base-Station Receiver

  4. Noise +MAI Base Station Reflected Path Direct Path User 1 User 2 Channel estimation • compensate for unknown fading amplitudes and asynchronous delays.

  5. Need for multiuser channel estimation • Detector performance depends on accuracy of channel estimator • Multiuser Channel Estimation • Jointly estimate parameters for all users • Better performance than single user estimates

  6. Computing multiuser channel estimates • Computed by sending a training sequence of known bits to the receiver. • When absent, detected bits can be used to update estimates in a decision feedback mode for tracking. • Importance of multiuser estimation usually neglected • May exceed detector complexity

  7. Multiuser Channel Estimation Algorithm • = {+1, -1} : Training/Tracking bits • = 8-bit integer (complex) : Received signal • N = spreading gain (typically fixed ,e.g: 32) • K = number of users (variable, <=N) • = Maximum Likelihood channel estimate

  8. Implementation complexity • Matrix inversions (size 32x32) per window • Unable to meet real-time on DSPs [Asilomar’99] • VLSI fixed-point architectures for matrix inversions • Difficult to design , Finite precision problems • Typically, simpler single-user sliding correlator structures used.

  9. Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Conclusions

  10. Iterative scheme for channel estimation • Bit-streaming : suitable for tracking (window length L) • Method of gradient descent • Stable convergence behavior • Simple fixed-point VLSI architecture

  11. Comparison of Bit Error Rates (BER) -1 10 -2 BER 10 O(K2N) MF ActMF ML ActML O(K3+K2N) -3 10 4 5 6 7 8 9 10 11 12 Signal to Noise Ratio (SNR) Simulations - Static multipath channel SINR = 0 dB Paths =3 Preamble =150 Spreading N = 31 Users K = 15

  12. 0 10 MF - Static MF - Tracking ML - Static ML - Tracking -1 10 BER -2 10 -3 10 4 5 6 7 8 9 10 11 12 SNR Rayleigh Fading channel with tracking Doppler = 10 Kmph

  13. Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Conclusions

  14. Area-Time Tradeoffs • Design for 32 users (K) and spreading code (N) 32 • Target = 128 Kbps (4000 cycles at 500 MHz). • Assume single cycle addition/multiplication • Area-Constrained Architecture :Pico-cells/fewer users • Time-Constrained Architecture : Maximum data rates • Area-Time Efficient Architecture : Real-Time

  15. Tracking Window L Correlation Matrices (Per Bit) Iterate Detected Bits M UX b0 (2K,1) Rbr O(2KN,8) Pilot Bits b(2K,1) A O(4K2N,8) Data M UX Channel Estimate to Detector r0 (N,8) Rbb O(2K2,8) Pilot r(N,8) TIME Task decomposition: channel estimation

  16. Architecture design: auto-correlation • b = {+1,-1} • Multiplication is a XNOR operation • Matrix updated using XNOR gates • Auto-correlation matrix implemented as an UP/DOWN counter(s)

  17. Architecture design: cross-correlation • b = {+1,-1}, r = 8-bit integer vector (complex) • Multiplications reduce to additions/subtractions • Matrix (complex) can be updated with 8-bit adders • Cross-correlation matrix stored as RAM.

  18. Architecture design: channel estimate • A = 8-bit integer matrix (complex) • µ << 1 : Truncated multiplication [Schulte’93] • Matrix-matrix (real-complex) multiplication of integers • Forms the bottleneck (8-bit multipliers) • Concentrate on multiplication for area-time tradeoffs!

  19. b i A(i) A(i-1) Rbb j 8 8 8 1 8 Load Store MUX EN 1 DEMUX 1 MUX Counter 1 U/D 8 8 8 b0 1 MAC Subtract i j 16 8 Rbr 1 8 >> Subtract 1 8 16 Add/ Sub Add/ Sub 1 8 8 1 j j r r0 Area-Constrained Architecture Channel Estimate b b0

  20. Area-constrained Architecture: Hardware Requirements

  21. Time-constrained Architecture K(2K-1)*1 2K*1 M U X b b*bT b0 b0*b0T K(2K-1)*1 Channel Estimate 2K*1 Rbb A 2K*1 2K2*8 2KN*8 MUX Mult Subtract r M U X 2K*1 2KN*8 N*8 2KN*16 >> Rbr Subtract r0 N*8 2KN*8 2KN*16 N*8

  22. Auto-correlation Update in Parallel 1 bbT(i,j) b (2K) U/D# U/D# Counter Counter a b c d Rbb(i,j) Rbb(i,i) a·b a·c a·d b·c b·d c·d bbT(K*{2K-1}*1) Rbb (2K2*8) Array of XNORs Array of Counters

  23. b (2K*1) a b c d r (N*8) b(i) Add/ Sub# 1 8 8 Adder Rbr(i,j) Rbr(2KN*8) Cross-Correlation Update in Parallel r(j)

  24. Time-constrained Architecture: Hardware Requirements

  25. Area-Time efficient architecture design • Area - constrained Architecture • Minimize area - single 8-bit multiplier • 4K2N cycles (128,000 cycles ; 3.81 Kbps) • Time-constrained Architecture • Minimize time - 4K2N 8-bit multipliers • Log2(2K) cycles (6 cycles ; 83.33 Mbps) • Aim : To meet real-time with min. area overhead • Different parallelism levels for multipliers

  26. 2K*1 Counters MUX 2K*1 2K*8 b0*b0T b*bT A(i) A(i-1) Rbb 2K*1 2K*1 1*8 2K*8 2K*8 b b0 DEMUX Mult MUX 2K*1 2K*1 2K*8 MUX 1*16 Subtract r 1*1 1*8 M U X N*8 1*8 Adder >> Subtract r0 1*8 1*8 1*16 N*8 Load Store Rbr Area-Time Efficient Architecture Channel Estimate

  27. Area-Time Efficient Architecture: Hardware Requirements

  28. Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Conclusions

  29. Comparisons • DSPs unable to exploit bit-level parallelism • Inefficient storage of bits • Replacing multiplications by additions/subtractions

  30. Scalability of Architectures with K • Disadvantages of VLSI architectures • Design for maximum number of users in the system • If there are fewer users, • Turn off functional units to reduce power • Reconfigure hardware for higher data rates (FPGA) • Dr. Cavallaro, don’t know to handle this Question properly • We never designed an architecture/algorithm for varying number of users dynamically. (Though we had started on it) • What should be included in future work? • Please give suggestions!!

  31. Conclusions • Real-Time VLSI architecture for multiuser channel estimation • Iterative fixed-point algorithm developed to avoid matrix inversions • Area-Time Tradeoffs presented • Area-Constrained, Time-Constrained, Area-Time efficient • VLSI architectures exploit bit-level computations and parallelism to meet real-time.

More Related