850 likes | 951 Views
Advanced Wireless Receivers: Algorithmic and Architectural Optimizations. Suman Das Rice University Department of Electrical and Computer Engineering & Center for Multimedia Communication. 700. 600. 500. millions of cell-phone users. 400. 300. 200. 100. 0. 1993. 1994. 1995. 1996.
E N D
Advanced Wireless Receivers: Algorithmic and Architectural Optimizations Suman Das Rice University Department of Electrical and Computer Engineering & Center for Multimedia Communication
700 600 500 millions of cell-phone users 400 300 200 100 0 1993 1994 1995 1996 1997 1998 1999 2000 2001 Year Source: Ericsson Introduction Wireless is one of the fastest growing industries “By 2002, a lot more cellular phones are going to have internet access than PCs.” Larry Ellison , CEO, Oracle.
Wireless Cellular Ad-hoc Network Bluetooth/ Home Networks Wireless LAN Ubiquitous wireless connectivity
Why advanced receiver algorithms? • The number of wireless subscribers growing • Multimedia data replacing voice traffic • Higher and varied data rate (144Kbps - 2Mbps) • Stricter quality of service (QOS) • Wireless bandwidth remains a critical resource Current generation receivers are suboptimal
0 10 -2 10 bit error rate -4 10 4 6 8 10 12 14 16 SNR (dB) Performance of advanced receivers Current receiver Advanced receiver Theoretical limit Huge performance improvement
Computational requirements of advanced receivers • 15 user system transmitting at 0.5Mbps needs • ~20 Billion additions per second • ~15 Billion multiplications per second • Requires 32 bit floating point precision Need fifty 200 MHz floating point DSP-s!
My research • Receiver design • High performance • Low complexity • Approach • Algorithmic simplification • Efficient architectural mapping
Wireless channel model • Channel Effects • Background noise • Fading • Multiple paths • Multiple Users • Multiple Access Interference(MAI) Noise Direct Path Reflected Paths Base Station User 1 User 2
Code Division Multiple Access (CDMA) S(t) • Wideband CDMA -technology of choice • Users distinguished by spreading sequence chip Spreading gain = 7 time bit -1 -1 1 -1 1 1 1 Received signal K: # of users P: # of paths w: attenuation t: delay b: data bits
data MODULATION ENCODING SPREADING OTHER USERS TRANSMITTER detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER CDMA system • Proposed advanced/multiuser receiver modules • Designed in isolation • Suboptimal design
detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Joint detection and decoding
Why separate channel estimation and detection? Received signal Channel Estimation Chip-matched filter Code-matched filter Detection delay time bi+1 bi ri Processing Window for Chan. Est. Operate on different statistics
Towards an integrated solution • Reuse computation from channel estimation step • Use same discretized filter output • Avoid alignment to bit interval of each user • Reduce computation • Save hardware
delay 1 1 10 0 0 0 0 0 0-1 -1 1 -1 Components of the observation vector bit i = +1 bit i+1 = -1 wk,p -wk,p -1 -1 1 -1 1 1 11 1 -1 1 -1 -1 -1 wk,p attenuation +
bit i = +1 bit i+1 = +1 -1 -1 1 -1 1 1 1-1 -1 1 -1 1 1 1 bk(i) + other users Uk Zk Matrix representation r = U Z bpreamble
Efficient statistics • Parametric approach • Build channel model (number of paths) • Estimate delay, attenuation • Produce the code matched filter output • Our approach • Estimate effective spreading code (UZ) • Code matched filter y = (UZ)T r
Simulation parameters • System parameters • 15 users • 3 paths • Spreading gain - 31 • Hardware platform • TI C62 and C67 EVM boards • 64 KB each internal program & data memory • 256 KB SBSRAM, 8 MB SDRAM (external) • Code-composer 1.0 to profile code
Effectiveness of integrated design 0 10 Single User -1 10 Multiuser algorithms Parametric approach UZ approach Actual Parameters bit error rate -2 10 -3 10 -4 10 -2 0 2 4 6 8 10 12 14 16 -4 SNR (dB) 2dB gain in performance
Computational savings • Avoid extraction of actual channel parameters • Avoid realignment of data for code-matched filtering • Reduce intermediate storage requirement • Avoid divisions (28 cycles) and square-root (38 cycles) in DSP.
Fixed point behavior • Fixed point advantages • Speed Power Cost • Fixed point analysis • 12 bit of precision required instead of 32 bits! • Pack two16 bit operations in 32 bit registers • More packing with • Saturation arithmetic • User power control!
Time requirement 100 90 80 68.5 70 60 Normalized time 2.39 X speedup 41.8 50 40 30 20 10 0 Unified Synch + Detect Original 16 bit fixed-point
detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Effective spreading code approach • Optimized detector design • Joint detection and decoding
Linear multiuser detector • Received signal r = (UZ) b + n • Channel estimation (UZ) • Matched filter outputy = (UZ)T r • Linear detector R b + n= y solve • R = ((UZ)TUZ) • Size of the linear system(NK) • Direct inverse takesO((NK)3) operation N block-length K # of Users
Correlation matrix isblock-Toeplitz Outline of the Kronecker algorithm Approximate it as a block-circulant system
Solve N independent order K system iteratively Outline of the Kronecker algorithm • Kronecker representation • Isolates structure and the matrix blocks • Fourier transform converts it to a block-diagonal system • Computationally optimal
Speedup in detector 90 83.1 Kbps 80 Complexity O(N2K3) Vs O(NK2 + KNlogN) 70 60 50 40 Achievable data rate (Kbps) 30 20 10.4 Kbps 10 0 Decorrelator Kronecker
Pipelining and parallelization • Mostly matrix based operations • Detector - iterative algorithm • Pipeline various iterations • Parallelize operations • Add more functional units • Distribute data across functional units • Distribute computations
Projected computation time 600 30 adders and multipliers 564.5 Kbps. DSP + Coprocessor support 500 400 Achievable data rate (Kbps) 300 DSP only 154.3 Kbps 200 100 20.75 Kbps 0 Base Multiuser Algorithm Hardware Pipelining Pipelining + Parallelization
detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Effective spreading code approach • Optimized detector design • Joint detection and decoding
d b MODULATION ENCODING SPREADING OTHER USERS TRANSMITTER Maximum a-posteriori (MAP) decoding • Received signal: r = UZd + n • Optimum decoding rule • Constrained optimization problem • Decode all users simultaneously Exponential complexity in number of users
y1 ^ b1 MF 1 Decoder 1 r yK ^ MF K Decoder K bK Single-user detection and decoding • Suboptimum alternatives • Isolate detection and decoding . . . .
Decoding matched filter outputs 0 10 MF+Viterbi Optimal -1 10 -2 10 BER -3 10 -4 10 1 2 3 4 5 6 7 8 SNR(dB) Huge performance loss!
User of concern c, interfering usersI r = (UZ)cdc + (UZ)IdI + z • Estimate dI • Eliminate interference: • Estimate dc for the next step = (UZ)Tc(r- (UZ)IdI) ^yc Iterative detection and decoding Complexity linear in number of users
Reduction in decoding complexity • Convolutional code • Coded bits depend on past data bits • Performance improves with memory length • Viterbi algorithm for decoding • Complexity exponential in memory length • Our suboptimal approach • Maximal weight basis decoding • Complexity quadratic in memory length
Joint detection and decoding performance 0 10 MF + Viterbi -1 10 Iter1 + Subopt Iter1 + Viterbi -2 Iter3 + Subopt 10 Optimal BER -3 10 Rate = 1/2 k = 7 -4 10 -5 10 1 2 3 4 5 6 7 8 SNR (dB)
Joint detection and decoding • Huge performance gain. • Suboptimal approximation - • Insignificant performance loss • Significant computational gain • Architecture for suboptimal decoding? • Viterbi algorithm - butterfly architecture • Have a sliding window implementation
Summary of contributions • Integrated channel estimation and detection model [wcnc] • Optimized detection algorithm [PIMRC, Tr. Com] • Fixed point implementation [ICASSP, SPIE] • Parallel architecture [Asilomar] • Joint detection and decoding [Globecom,Tr. Com] • Suboptimal decoding algorithm [Asilomar, Tr. Inf. Th.]
Wireless Cellular Ad-hoc Network Bluetooth/ Home Networks Wireless LAN Future research
Future research • Universal wireless receiver • Reconfigurable solution • Power efficient • Automate design? • Network level interaction • Resource allocation • Quality of service guarantee • Application level interaction
Further details http://www.ece.rice.edu/~suman http://www.ece.rice.edu/CMC
System Model - Received Signal • tk,p = q + g (integer and fraction part of delay) bit i+1 = +1 bit i = +1 -1 -1 1 -1 1 1 1 -1 -1 1 -1 1 1 1 q g ri(m)= wk,p rk,p(m) g 1 1 1 0 0 0 0 Right + wk,p rk,p(m) (1-g) 1 1 0 0 0 0 0 + wk,p rk,p(m) g Left 0 0 0 -1 -1 1 -1 + wk,p rk,p(m) (1-g) 0 0 -1 -1 1 -1 1 Chip Asynchronous
System Model - Received Signal • A(m) = [a1R a1L … akR akL … aKR aKL] • Tc: chip period; t/Tc=q+g (integer and fraction part) • Columns of A(m): linear combinations of ck[q]-s
Channel Response Model • Hence, columns of A(m) for each user k: akR = UkRzk(m), akL = UkLzk(m) where, UkR= [ckR[0] … ckR[q] … ckR[N-1]] UkL = [ckL[0] … ckL[q] … ckL[N-1]] zk(m) = hkopk(m) spreading code shifted by delay array response components multipath attenuation and delay components
System Model - Received Signal • Structure of pk(m) and hk: Path1 qk,1th element (qk,1+1)th qk,Pth element (qk,P+1)th 0 : wk,1(1-gk,1) wk,1gk,1 : wk,P(1-gk,P) wk,Pgk,P : 0 0 : rk,1,(m) rk,1 (m) : rk,P (m) rk,P(m) : 0 hk = pk(m)=
Channel Estimation - Maximum Likelihood Algorithm • U: spreading codes of all the users (known) • Z: all unknown parameters of all paths of all users • time delay, attenuation, array response • Goal:EstimateZusing spreading codes and preamble
Channel estimation - Maximum Likelihood Algorithm • Given: L observations r1,r2, …,rL. • Joint conditional probability density function of r1,r2,…,rL • Goal: Maximize above w. r. t. channel parameters (Z).
Multi-step Optimization Process • Define y = UZ. Form ML estimate y, of y. • Estimate zk by a least squares fit between y=UZ and y. • Extraction of individual parameters from zk-s ^ ^ ^
Iterative methods To solve Rb = y • Solution at step k is bk • Calculate error ek = Rbk - y • Modify solution from error and earlier estimate • Cost • matrix-vector product takes O(n2) operations • Each iteration takes O(n2) steps Can we do better?