450 likes | 531 Views
Advanced Wireless Receivers: Algorithmic and Architectural Optimizations. Suman Das Rice University Department of Electrical and Computer Engineering & Center for Multimedia Communication. 700. 600. 500. millions of cell-phone users. 400. 300. 200. 100. 0. 1993. 1994. 1995. 1996.
E N D
Advanced Wireless Receivers: Algorithmic and Architectural Optimizations Suman Das Rice University Department of Electrical and Computer Engineering & Center for Multimedia Communication
700 600 500 millions of cell-phone users 400 300 200 100 0 1993 1994 1995 1996 1997 1998 1999 2000 2001 Year Source: Ericsson Introduction Wireless is one of the fastest growing industries “By 2002, a lot more cellular phones are going to have internet access than PCs.” Larry Ellison , CEO, Oracle.
Wireless Cellular Ad-hoc Network Bluetooth/ Home Networks Wireless LAN Ubiquitous wireless connectivity
Why advanced receiver algorithms? • The number of wireless subscribers growing • Multimedia data replacing voice traffic • Higher and varied data rate (144Kbps - 2Mbps) • Stricter quality of service (QOS) • Wireless bandwidth remains a critical resource Current generation receivers are suboptimal
0 10 -2 10 bit error rate -4 10 4 6 8 10 12 14 16 SNR (dB) Performance of advanced receivers Current receiver Advanced receiver Theoretical limit Huge performance improvement
Computational requirements of advanced receivers • 15 user system transmitting at 0.5Mbps needs • ~20 Billion additions per second • ~15 Billion multiplications per second • Requires 32 bit floating point precision 50 floating point DSP-s running at 200MHz to sustain the computation!
My research • Receiver design • High performance • Low complexity • Approach • Algorithmic simplification • Efficient architectural mapping
Wireless channel model • Channel Effects • Background noise • Fading • Multiple paths • Multiple Users • Multiple Access Interference(MAI) Noise Direct Path Reflected Paths Base Station User 1 User 2
Code Division Multiple Access (CDMA) S(t) • Wideband CDMA -technology of choice • Users distinguished by spreading sequence chip Spreading gain = 7 time bit -1 -1 1 -1 1 1 1 Received signal K: # of users P: # of paths w: attenuation t: delay b: data bits
data MODULATION ENCODING SPREADING OTHER USERS TRANSMITTER detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER CDMA system • Proposed advanced/multiuser receiver modules • Designed in isolation • Suboptimal design
detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Joint detection and decoding
Why separate channel estimation and detection? Received signal Channel Estimation Chip-matched filter Code-matched filter Detection delay time bi+1 bi ri Processing Window for Chan. Est. Operate on different statistics
Towards an integrated solution • Reuse computation from channel estimation step • Use same discretized filter output • Avoid alignment to bit interval of each user • Reduce computation • Save hardware
delay 1 1 10 0 0 0 0 0 0-1 -1 1 -1 Components of the observation vector bit i = +1 bit i+1 = -1 wk,p -wk,p -1 -1 1 -1 1 1 11 1 -1 1 -1 -1 -1 wk,p attenuation +
bit i = +1 bit i+1 = +1 -1 -1 1 -1 1 1 1-1 -1 1 -1 1 1 1 bk(i) + other users Uk Zk Matrix representation r = U Z bpreamble
Efficient statistics • Parametric approach • Build channel model (number of paths) • Estimate delay, attenuation • Produce the code matched filter output • Our approach • Estimate effective spreading code (UZ) • Code matched filter y = (UZ)T r
Simulation parameters • System parameters • 15 users • 3 paths • Spreading gain - 31 • Hardware platform • TI C62 and C67 EVM boards • 64 KB each internal program & data memory • 256 KB SBSRAM, 8 MB SDRAM (external) • Code-composer 1.0 to profile code
Effectiveness of integrated design 0 10 Single User Multiuser -1 10 Parametric approach UZ approach Actual Parameters bit error rate -2 10 -3 10 -4 10 -2 0 2 4 6 8 10 12 14 16 -4 SNR (dB) 2dB gain in performance
Computational savings • Avoid extraction of actual channel parameters • Avoid realignment of data for code-matched filtering • Reduce intermediate storage requirement • Avoid divisions (28 cycles) and square-root (38 cycles) in DSP.
Fixed point behavior • Fixed point advantages • Speed Power Cost • Fixed point analysis • 12 bit of precision required instead of 32 bits! • Pack two16 bit operations in 32 bit registers • More packing with • Saturation arithmetic • User power control!
Time requirement 100 90 80 68.5 70 60 Normalized time 2.39 X speedup 41.8 50 40 30 20 10 0 Unified Synch + Detect Original 16 bit fixed-point
detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Effective spreading code approach • Optimized detector design • Joint detection and decoding
Linear multiuser detector • Received signal r = (UZ) b + n • Channel estimation (UZ) • Matched filter outputy = (UZ)T r • Linear detector R b + n= y solve • R = ((UZ)TUZ) • Size of the linear system(NK) • Direct inverse takesO((NK)3) operation N block-length K # of Users
Approximate it as a block-circulant system Correlation matrix isblock-Toeplitz Solve N independent order K system iteratively Outline of the Kronecker algorithm • Kronecker representation • Isolates structure and the matrix blocks • Fourier transform converts it to a block-diagonal system • Computationally optimal
90 83.1 Kbps 80 Complexity O(N2K3) Vs O(NK2 + KNlogN) 70 60 50 40 Achievable data rate (Kbps) 30 20 10.4 Kbps 10 0 Decorrelator Kronecker Speedup in detector
Pipelining and parallelization • Mostly matrix based operations • Detector - iterative algorithm • Pipeline various iterations • Parallelize operations • Add more functional units • Distribute data across functional units • Distribute computations
Projected computation time 600 30 adders and multipliers 564.5 Kbps. DSP + Coprocessor support 500 400 Achievable data rate (Kbps) 300 DSP only 154.3 Kbps 200 100 20.75 Kbps 0 Base Multiuser Algorithm Hardware Pipelining Pipelining + Parallelization
detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Effective spreading code approach • Optimized detector design • Joint detection and decoding
d b MODULATION ENCODING SPREADING OTHER USERS TRANSMITTER Maximum a-posteriori (MAP) decoding • Received signal: r = UZd + n • Optimum decoding rule • Constrained optimization problem • Decode all users simultaneously Exponential complexity in number of users
y1 ^ b1 MF 1 Decoder 1 r yK ^ MF K Decoder K bK Single-user detection and decoding • Suboptimum alternatives • Isolate detection and decoding . . . .
Decoding matched filter outputs 0 10 MF+Viterbi Optimal -1 10 -2 10 BER -3 10 -4 10 1 2 3 4 5 6 7 8 SNR(dB) Huge performance loss!
User of concern c, interfering usersI r = (UZ)cdc + (UZ)IdI + z • Estimate dI • Eliminate interference: • Estimate dc for the next step = (UZ)Tc(r- (UZ)IdI) ^yc Iterative detection and decoding Complexity linear in number of users
Reduction in decoding complexity • Convolutional code • Coded bits depend on past data bits • Performance improves with memory length • Viterbi algorithm for decoding • Complexity exponential in memory length • Our suboptimal approach • Maximal weight basis decoding • Complexity quadratic in memory length
Joint detection and decoding performance 0 10 MF + Viterbi -1 10 Iter1 + Subopt Iter1 + Viterbi -2 Iter3 + Subopt 10 Optimal BER -3 10 Rate = 1/2 k = 7 -4 10 -5 10 1 2 3 4 5 6 7 8 SNR (dB)
Joint detection and decoding • Huge performance gain. • Suboptimal approximation - • Insignificant performance loss • Significant computational gain • Architecture for suboptimal decoding? • Viterbi algorithm - butterfly architecture • Have a sliding window implementation
Summary of contributions • Integrated channel estimation and detection model [wcnc] • Optimized detection algorithm [PIMRC, Tr. Com] • Fixed point implementation [ICASSP, SPIE] • Parallel architecture [Asilomar] • Joint detection and decoding [Globecom,Tr. Com] • Suboptimal decoding algorithm [Asilomar, Tr. Inf. Th.]
Wireless Cellular Ad-hoc Network Bluetooth/ Home Networks Wireless LAN Future research
Future research • Universal wireless receiver • Reconfigurable solution • Power efficient • Automate design? • Network level interaction • Resource allocation • Quality of service guarantee • Application level interaction
Further details http://www.ece.rice.edu/~suman http://www.ece.rice.edu/CMC
dodd Rate : 1/2 memory (k):2 b deven dodd systematic bits deven parity bits Convolutional codes d2 = d1 d4 = d1 + d3 d6 = d1 + d3 + d5 d8 = d3 + d5 + d7 d10 = d5 + d7 + d9
Suboptimal single user channel decoder • y = (y1, …yN) • d = (d1, …dN) • Viterbi algorithm: • Complexity grows exponentially with k • If no codeword constraint d = sgn(y) • Estimated dmay not be a codeword !!
d2 = d1 d4 = d1 + d3 d6 = d1 + d3 + d5 d8 = d3 + d5 + d7 d10 = d5 + d7 + d9 Maximum weight basis decoding • More variables than equations • NR independent variables N: block-length R: Rate • Choice depends on yi • y= 7.5 d = 1 • y= - 4.5 d = -1 • y = 0.5 d = ? Want to choose maximally independentsubset with largest total weight
Selection of maximally independent subset • Set I = • Given y, sort the weights |yi|: i = {1..N} • While | I| < NR • Choose location from {1..N} with largest weight such that I Ue is still an independent subset of {1..N} • Set I = I Ue • .
If de = sgn(ye) Suboptimal decoding algorithm • Chose M maximum independent subset • For each independent subset • Compute the codeword dI • Compute the likelihood p (y|dI) • Chose codeword with largest likelihood Decoding complexity reduced from O(2k)toO(k2)
Performance improvement 0 10 MF+MAP 2stage + MAP Single User -1 10 -2 BER 10 -3 10 Performance approaches single-user bound -4 10 1 2 3 4 5 6 7 8 SNR(dB)