1 / 33

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers. Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang. This work is supported by Nokia, TI, TATP and NSF. Introduction.

ratana
Download Presentation

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang This work is supported by Nokia, TI, TATP and NSF

  2. Introduction • A real-time VLSI architecture for channel estimation • Usually neglected, but high computational complexity • Current DSP solutions do not meet real-time • Iterative fixed point algorithm developed • Area-Time Tradeoffs discussed • Area-Constrained (Pico-cells) • Time-Constrained (Theoretical Data Rates) • Area-Time efficient (Real-Time Solution)

  3. Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Comparisons with DSP solutions • Related Work and Conclusions

  4. Evolution of mobile communications First generation Voice Second/Current generation Voice + Low-rate data (9.6Kbps) Third generation + Voice + High-rate data (2 Mbps/384 Kbps/128 Kbps) + multimedia

  5. Direct Path Channel estimation Noise +MAI Base Station Reflected Path User 1 User 2

  6. Need for channel estimation • To compensate for unknown fading amplitudes and asynchronous delays. • Detector performance depends on accuracy of channel estimator

  7. Computing channel estimates • Computed by sending a training sequence of known bits to the receiver. • When absent, detected bits can be used to update estimates in a decision feedback mode for tracking. • Importance usually neglected • May exceed detector complexity

  8. Baseband signal processing Antenna Multiple Users Detection Decoding Detected Bits Training Tracking Channel estimation Base-Station Receiver

  9. Implementation complexity • Matrix inversions (size 32x32) per window • Unable to meet real-time on DSPs [Asilomar’99] • VLSI fixed-point architectures for matrix inversions • Precision problems • Typically, simpler single-user sliding correlator structures used.

  10. Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Comparisons with DSP solutions • Related Work and Conclusions

  11. Iterative scheme for channel estimation • Method of Gradient Descent • Stable convergence behavior • Same Performance • Simpler Bit-Streaming Hardware Implementation

  12. Comparison of Bit Error Rates (BER) -1 10 -2 BER 10 O(K2N) MF ActMF ML ActML O(K3+K2N) -3 10 4 5 6 7 8 9 10 11 12 Signal to Noise Ratio (SNR) Simulations - Static multipath channel SINR = 0 Paths =3 Preamble L =150 Spreading N = 31 Users K = 15

  13. 0 10 MF - Static MF - Tracking ML - Static ML - Tracking -1 10 BER -2 10 -3 10 4 5 6 7 8 9 10 11 12 SNR Fading channel with tracking Doppler = 10 Kmph

  14. Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Comparisons with DSP solutions • Related Work and Conclusions

  15. Area-Time Tradeoffs • Design for 32 users (K) and spreading code (N) 32 • Target Data Rate = 128 Kbps • Low Power Issues ignored! • Area-Constrained Architecture • Pico-cells ; lower data rates • Time-Constrained Architecture • Maximum achieve-able data rates • Area-Time Efficient Architecture • Real-Time with minimum area overhead

  16. Tracking Window L Correlation Matrices (Per Bit) Iterate Detected Bits M UX b0 (2K,1) Rbr O(2KN,8) Pilot Bits b(2K,1) A O(4K2N,8) Data Channel Estimate to Detector M UX r0 (N,8) Rbb O(2K2,8) Pilot r(N,8) TIME Task Decomposition

  17. Architecture Design: Auto-correlation • b = {+1,-1} • Multiplication is a XNOR operation • Entire matrix can be updated sequentially or in parallel using XNOR gates • Auto-correlation matrix implemented as an UP/DOWN counter(s)

  18. Architecture Design: Cross-Correlation • b = {+1,-1}, r = 8-bit integer vector (complex) • Multiplications reduce to additions/subtractions • Entire matrix (complex) can be updated sequentially or in parallel using 8-bit adders • Cross-correlation matrix stored as RAM.

  19. Architecture Design: Channel Estimate • A = 8-bit integer matrix (complex) • µ << 1 : Truncated Multiplication [Schulte’93] • Matrix-matrix (real-complex) multiplication of integers • Forms the bottleneck • Can be done sequentially with a single multiplier or totally parallel or partially parallel • Concentrate on multiplication for area-time tradeoffs!

  20. b i A Anew Rbb j 8 8 8 1 8 Load Store MUX EN 1 DEMUX 1 MUX Counter 1 U/D 8 8 8 b0 1 MAC Subtract i j 16 8 Rbr 1 8 >> Subtract 1 8 16 Add/ Sub Add/ Sub 1 8 8 1 j j r r0 Area-Constrained Architecture b b0

  21. Area-constrained Architecture: Hardware Requirements

  22. Time-constrained Architecture K(2K-1)*1 2K*1 M U X b b*bT b0 b0*b0T K(2K-1)*1 Channel Estimate 2K*1 Rbb A 2K*1 2K2*8 2KN*8 MUX Mult Subtract r M U X 2K*1 2KN*8 N*8 2KN*16 >> Rbr Subtract r0 N*8 2KN*8 2KN*16 N*8

  23. b (2K) Array of Counters a b c d a·b a·c a·d b·c b·d c·d Rbb (2K2*8) bbT(K*{2K-1}*1) Auto-correlation Update in Parallel 1 bbT(i,j) U/D# U/D# Array of XNORs Counter Counter Rbb(i,j) Rbb(i,i)

  24. b (2K*1) r (N*8) a b c d b(i) Add/ Sub# 1 Rbr(2KN*8) 8 8 Adder r(j) Rbr(i,j) Cross-Correlation Update in Parallel

  25. Time-constrained Architecture: Hardware Requirements

  26. 2K*1 Counters MUX 2K*1 2K*8 b0*b0T b*bT A Anew Rbb 2K*1 2K*1 1*8 2K*8 2K*8 b b0 DEMUX Mult MUX 2K*1 2K*1 2K*8 MUX 1*16 Subtract r 1*1 1*8 M U X N*8 1*8 Adder >> Subtract r0 1*8 1*8 1*16 N*8 Load Store Rbr Area-Time Efficient Architecture

  27. Area-Time Efficient Architecture: Hardware Requirements

  28. Outline • What is multiuser channel estimation? • Need for multiuser channel estimation • Implementation problems • Algorithm enhancements • VLSI architectures • Area-constrained,Time-constrained, Area-Time efficient • Comparisons with DSP solutions • Related Work and Conclusions

  29. DSP Comparisons • DSPs unable to exploit bit-level parallelism • Inefficient storage of bits • Replacing multiplications by additions/subtractions

  30. 64-bit Register D[i][j] 8 8 +/- +/- 8 8-bit Control Register b[i] 64-bit Register D[i][j] Related Work: DSP Extensions (Cross-Correlation) For i = 1..8, j= 1..8 D[i][j] = D[i][j] + b[i]*C[j]

  31. Related Work: Online Arithmetic • Multiuser Detection • Need to compute only the Sign Bit (Most Significant Digit ) • No back-conversion to conventional representation • complex-number representation possible • Integration with channel estimation also.

  32. Related Work : DSP-FPGA solutions • Multiple DSP-FPGA task partitioning • Bit level parallelism on FPGAs • Multiplications on DSPs. • Sundance Multi-DSP System • 2 TI C67 DSPs • 2 Xilinx Virtex FPGAs • http://www.sundance.com

  33. Conclusions • Real-Time VLSI architecture for multiuser channel estimation • Iterative fixed-point algorithm developed to avoid matrix inversions • Area-Time Tradeoffs discussed • Area-Constrained (Pico-cells) • Time-Constrained (Data Rates) • Area-Time efficient (Real-Time) • VLSI architectures better exploit bit-level computations and parallelism to meet real-time constraints than DSPs.

More Related