1 / 85

Advanced Wireless Receivers: Algorithmic and Architectural Optimizations

Advanced Wireless Receivers: Algorithmic and Architectural Optimizations. Suman Das Rice University Department of Electrical and Computer Engineering & Center for Multimedia Communication. 700. 600. 500. millions of cell-phone users. 400. 300. 200. 100. 0. 1993. 1994. 1995. 1996.

Download Presentation

Advanced Wireless Receivers: Algorithmic and Architectural Optimizations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Wireless Receivers: Algorithmic and Architectural Optimizations Suman Das Rice University Department of Electrical and Computer Engineering & Center for Multimedia Communication

  2. 700 600 500 millions of cell-phone users 400 300 200 100 0 1993 1994 1995 1996 1997 1998 1999 2000 2001 Year Source: Ericsson Introduction Wireless is one of the fastest growing industries “By 2002, a lot more cellular phones are going to have internet access than PCs.” Larry Ellison , CEO, Oracle.

  3. Wireless Cellular Ad-hoc Network Bluetooth/ Home Networks Wireless LAN Ubiquitous wireless connectivity

  4. Why advanced receiver algorithms? • The number of wireless subscribers growing • Multimedia data replacing voice traffic • Higher and varied data rate (144Kbps - 2Mbps) • Stricter quality of service (QOS) • Wireless bandwidth remains a critical resource Current generation receivers are suboptimal

  5. 0 10 -2 10 bit error rate -4 10 4 6 8 10 12 14 16 SNR (dB) Performance of advanced receivers Current receiver Advanced receiver Theoretical limit Huge performance improvement

  6. Computational requirements of advanced receivers • 15 user system transmitting at 0.5Mbps needs • ~20 Billion additions per second • ~15 Billion multiplications per second • Requires 32 bit floating point precision Need fifty 200 MHz floating point DSP-s!

  7. My research • Receiver design • High performance • Low complexity • Approach • Algorithmic simplification • Efficient architectural mapping

  8. Wireless channel model • Channel Effects • Background noise • Fading • Multiple paths • Multiple Users • Multiple Access Interference(MAI) Noise Direct Path Reflected Paths Base Station User 1 User 2

  9. Code Division Multiple Access (CDMA) S(t) • Wideband CDMA -technology of choice • Users distinguished by spreading sequence chip Spreading gain = 7 time bit -1 -1 1 -1 1 1 1 Received signal K: # of users P: # of paths w: attenuation t: delay b: data bits

  10. data MODULATION ENCODING SPREADING OTHER USERS TRANSMITTER detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER CDMA system • Proposed advanced/multiuser receiver modules • Designed in isolation • Suboptimal design

  11. detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Joint detection and decoding

  12. Why separate channel estimation and detection? Received signal Channel Estimation Chip-matched filter Code-matched filter Detection delay time bi+1 bi ri Processing Window for Chan. Est. Operate on different statistics

  13. Towards an integrated solution • Reuse computation from channel estimation step • Use same discretized filter output • Avoid alignment to bit interval of each user • Reduce computation • Save hardware

  14. delay 1 1 10 0 0 0 0 0 0-1 -1 1 -1 Components of the observation vector bit i = +1 bit i+1 = -1 wk,p -wk,p -1 -1 1 -1 1 1 11 1 -1 1 -1 -1 -1 wk,p attenuation +

  15. bit i = +1 bit i+1 = +1 -1 -1 1 -1 1 1 1-1 -1 1 -1 1 1 1 bk(i) + other users Uk Zk Matrix representation r = U Z bpreamble

  16. Efficient statistics • Parametric approach • Build channel model (number of paths) • Estimate delay, attenuation • Produce the code matched filter output • Our approach • Estimate effective spreading code (UZ) • Code matched filter y = (UZ)T r

  17. Simulation parameters • System parameters • 15 users • 3 paths • Spreading gain - 31 • Hardware platform • TI C62 and C67 EVM boards • 64 KB each internal program & data memory • 256 KB SBSRAM, 8 MB SDRAM (external) • Code-composer 1.0 to profile code

  18. Effectiveness of integrated design 0 10 Single User -1 10 Multiuser algorithms Parametric approach UZ approach Actual Parameters bit error rate -2 10 -3 10 -4 10 -2 0 2 4 6 8 10 12 14 16 -4 SNR (dB) 2dB gain in performance

  19. Computational savings • Avoid extraction of actual channel parameters • Avoid realignment of data for code-matched filtering • Reduce intermediate storage requirement • Avoid divisions (28 cycles) and square-root (38 cycles) in DSP.

  20. Fixed point behavior • Fixed point advantages • Speed Power Cost • Fixed point analysis • 12 bit of precision required instead of 32 bits! • Pack two16 bit operations in 32 bit registers • More packing with • Saturation arithmetic • User power control!

  21. Time requirement 100 90 80 68.5 70 60 Normalized time 2.39 X speedup 41.8 50 40 30 20 10 0 Unified Synch + Detect Original 16 bit fixed-point

  22. detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Effective spreading code approach • Optimized detector design • Joint detection and decoding

  23. Linear multiuser detector • Received signal r = (UZ) b + n • Channel estimation (UZ) • Matched filter outputy = (UZ)T r • Linear detector R b + n= y solve • R = ((UZ)TUZ) • Size of the linear system(NK) • Direct inverse takesO((NK)3) operation N block-length K # of Users

  24. Correlation matrix isblock-Toeplitz Outline of the Kronecker algorithm Approximate it as a block-circulant system

  25. Solve N independent order K system iteratively Outline of the Kronecker algorithm • Kronecker representation • Isolates structure and the matrix blocks • Fourier transform converts it to a block-diagonal system • Computationally optimal

  26. Speedup in detector 90 83.1 Kbps 80 Complexity O(N2K3) Vs O(NK2 + KNlogN) 70 60 50 40 Achievable data rate (Kbps) 30 20 10.4 Kbps 10 0 Decorrelator Kronecker

  27. Pipelining and parallelization • Mostly matrix based operations • Detector - iterative algorithm • Pipeline various iterations • Parallelize operations • Add more functional units • Distribute data across functional units • Distribute computations

  28. Projected computation time 600 30 adders and multipliers 564.5 Kbps. DSP + Coprocessor support 500 400 Achievable data rate (Kbps) 300 DSP only 154.3 Kbps 200 100 20.75 Kbps 0 Base Multiuser Algorithm Hardware Pipelining Pipelining + Parallelization

  29. detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Effective spreading code approach • Optimized detector design • Joint detection and decoding

  30. d b MODULATION ENCODING SPREADING OTHER USERS TRANSMITTER Maximum a-posteriori (MAP) decoding • Received signal: r = UZd + n • Optimum decoding rule • Constrained optimization problem • Decode all users simultaneously Exponential complexity in number of users

  31. y1 ^ b1 MF 1 Decoder 1 r yK ^ MF K Decoder K bK Single-user detection and decoding • Suboptimum alternatives • Isolate detection and decoding . . . .

  32. Decoding matched filter outputs 0 10 MF+Viterbi Optimal -1 10 -2 10 BER -3 10 -4 10 1 2 3 4 5 6 7 8 SNR(dB) Huge performance loss!

  33. User of concern c, interfering usersI r = (UZ)cdc + (UZ)IdI + z • Estimate dI • Eliminate interference: • Estimate dc for the next step = (UZ)Tc(r- (UZ)IdI) ^yc Iterative detection and decoding Complexity linear in number of users

  34. Reduction in decoding complexity • Convolutional code • Coded bits depend on past data bits • Performance improves with memory length • Viterbi algorithm for decoding • Complexity exponential in memory length • Our suboptimal approach • Maximal weight basis decoding • Complexity quadratic in memory length

  35. Joint detection and decoding performance 0 10 MF + Viterbi -1 10 Iter1 + Subopt Iter1 + Viterbi -2 Iter3 + Subopt 10 Optimal BER -3 10 Rate = 1/2 k = 7 -4 10 -5 10 1 2 3 4 5 6 7 8 SNR (dB)

  36. Joint detection and decoding • Huge performance gain. • Suboptimal approximation - • Insignificant performance loss • Significant computational gain • Architecture for suboptimal decoding? • Viterbi algorithm - butterfly architecture • Have a sliding window implementation

  37. Summary of contributions • Integrated channel estimation and detection model [wcnc] • Optimized detection algorithm [PIMRC, Tr. Com] • Fixed point implementation [ICASSP, SPIE] • Parallel architecture [Asilomar] • Joint detection and decoding [Globecom,Tr. Com] • Suboptimal decoding algorithm [Asilomar, Tr. Inf. Th.]

  38. Wireless Cellular Ad-hoc Network Bluetooth/ Home Networks Wireless LAN Future research

  39. Future research • Universal wireless receiver • Reconfigurable solution • Power efficient • Automate design? • Network level interaction • Resource allocation • Quality of service guarantee • Application level interaction

  40. Further details http://www.ece.rice.edu/~suman http://www.ece.rice.edu/CMC

  41. UZ Method

  42. System Model - Received Signal • tk,p = q + g (integer and fraction part of delay) bit i+1 = +1 bit i = +1 -1 -1 1 -1 1 1 1 -1 -1 1 -1 1 1 1 q g ri(m)= wk,p rk,p(m) g 1 1 1 0 0 0 0 Right + wk,p rk,p(m) (1-g) 1 1 0 0 0 0 0 + wk,p rk,p(m) g Left 0 0 0 -1 -1 1 -1 + wk,p rk,p(m) (1-g) 0 0 -1 -1 1 -1 1 Chip Asynchronous

  43. System Model - Received Signal • A(m) = [a1R a1L … akR akL … aKR aKL] • Tc: chip period; t/Tc=q+g (integer and fraction part) • Columns of A(m): linear combinations of ck[q]-s

  44. Channel Response Model • Hence, columns of A(m) for each user k: akR = UkRzk(m), akL = UkLzk(m) where, UkR= [ckR[0] … ckR[q] … ckR[N-1]] UkL = [ckL[0] … ckL[q] … ckL[N-1]] zk(m) = hkopk(m) spreading code shifted by delay array response components multipath attenuation and delay components

  45. System Model - Received Signal • Structure of pk(m) and hk: Path1 qk,1th element (qk,1+1)th qk,Pth element (qk,P+1)th 0 : wk,1(1-gk,1) wk,1gk,1 : wk,P(1-gk,P) wk,Pgk,P : 0 0 : rk,1,(m) rk,1 (m) : rk,P (m) rk,P(m) : 0 hk = pk(m)=

  46. Channel Estimation - Maximum Likelihood Algorithm • U: spreading codes of all the users (known) • Z: all unknown parameters of all paths of all users • time delay, attenuation, array response • Goal:EstimateZusing spreading codes and preamble

  47. Channel estimation - Maximum Likelihood Algorithm • Given: L observations r1,r2, …,rL. • Joint conditional probability density function of r1,r2,…,rL • Goal: Maximize above w. r. t. channel parameters (Z).

  48. Multi-step Optimization Process • Define y = UZ. Form ML estimate y, of y. • Estimate zk by a least squares fit between y=UZ and y. • Extraction of individual parameters from zk-s ^ ^ ^

  49. Kronecker algorithm

  50. Iterative methods To solve Rb = y • Solution at step k is bk • Calculate error ek = Rbk - y • Modify solution from error and earlier estimate • Cost • matrix-vector product takes O(n2) operations • Each iteration takes O(n2) steps Can we do better?

More Related