Implementing algorithms for advanced communication systems -- My bag of tricks

Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported by Nokia, TI, TATP and NSF

Motivation • Build wireless multimedia communication systems • - Kbps to Mbps • Sophisticated algorithms - exponential complexity • Approaches: • Sub-optimal algorithms - O(n2,n3) complexity • Better hardware implementations needed

Contributions • Develop algorithms suitable for implementation • Bit-level extensions to microprocessors • Pipelining to reduce latency and memory • On-line arithmetic for Most Significant Digit First Computations.

Outline • Advanced communication systems • Algorithms for efficient implementation • Pipelining • On-line arithmetic • Bit-level extensions to microprocessors • Summary

Communication System - Physical layerTransmitter Information bits (from higher layers) Digital Antenna Analog RF unit D/A Coding Spreading +1

Communication System - Physical layerChannel Multipath reflections, attenuations, noise, multiple user interference

Communication System - Physical layerReceiver Antenna Digital Analog +1 Detection Decoding Information bits (to higher layers) RF unit A/D Channel estimation

Questions • Higher data rates => sophisticated algorithms • => strain on hardware => lower data rates • 1.Which is the best algorithm to use for implementation? • 2.How to best do the digital part? • - VLSI, DSP, FPGA, microprocessor • - combination of these?

Multiuser Channel Estimation Algorithm = {+1, -1} : Training/Tracking bits = 8-bit integer (complex) : Received signal N = spreading gain (typically fixed ,e.g: 32) K = number of users (variable, <=N) = Maximum Likelihood channel estimate

Iterative hardware-efficient scheme Bit-streaming : suitable for tracking (window length L) Method of gradient descent Stable convergence behavior Simple fixed-point VLSI architecture

Comparison of Bit Error Rates (BER) -1 10 -2 BER 10 O(K2N) MF ActMF ML ActML O(K3+K2N) -3 10 4 5 6 7 8 9 10 11 12 Signal to Noise Ratio (SNR) Simulations - Static multipath channel SINR = 0 dB Paths =3 Preamble =150 Spreading N = 31 Users K = 15

ri-2 ri-1 ri ri+1 User 1 time bi+1 bi Interference from future bits of other users ri Desired User Interference from previous bits of other users User j Multiuser interference

Matched Filter 1 12 Stage 1 1 12 Stage 2 1 12 Stage 3 1 12 Matched Filter Bits 2-11 11 22 Stage 1 11 22 Stage 2 11 22 Stage 3 11 22 Bits 12-21 Block Based Detector

Detection Matched filter Iterate for convergence

ri-2 ri-1 ri ri+1 User 1 time bi+1 bi Interference from future bits of other users ri Desired User Interference from previous bits of other users User j Pipelined detection scheme

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Pipelined Detector Matched Filter 1 2 3 4 5 6 7 8 9 10 11 12 Stage 1 Stage 2 Stage 3

Chip being built as part of the Elec 422 VLSI course project

On-line arithmetic • Sign of dot-product computations • High precision operations done to find the sign • Can be avoided with Most Significant Digit First computation using redundant number systems

DSP/microprocessor implementations • Further acceleration needed for real-time performance • Matrix based massively parallel algorithms • Detection of bits {+1,-1} : bit - level operations • DSPs • Bit multiplications not needed - (add/subtract on FPGA) • Bit storage not convenient • Not fully able to exploit parallelism

DSP2 FPGA1 FPGA2 Code matched filter detector PIC (Stage 1) PIC (Stage 2) Detected bits Received bits Multiuser estimation DSP1 FPGAs for acceleration • Flexibility of ASICs • Good for parallelism and bit-level operations

-2 10 -3 10 -4 10 Execution time (in seconds) -5 10 Single DSP implementation 2 DSP implementation Target data rate - 128 Kbps/user 2 DSPs + 2 FPGAs -6 10 0 5 10 15 20 25 30 35 Users Multiprocessor simulations

Instruction Set Extensions • To accelerate Bit level computations in Wireless • Real/Complex Integer - Bit Multiplications • Used in Multiuser Detection, Decoding • Bit - Bit Multiplications • Used in Outer Product Updates • Correlation, Channel Estimation • Complex Integer-Integer Multiplications • Useful in other Signal Processing applications • Speech, Video,,,

64-bit Register A 64-bit Register B 8 8 + + x 8 64-bit Register C SIMD Parallelism

64-bit Register D[i][j] 8 8 +/- +/- 8 8-bit Control Register b[i] 64-bit Register D[i][j] Integer - Bit Multiplications For i = 1..8, j= 1..8 D[i][j] = D[i][j] + b[i]*C[j] (Cross-Correlation) 64-bit Register C[j]

Computational Savings • Avoid bit multiplications and control structures • 4 8-bit Multiply • -Latency 3 cycles • 8 8-bit Add • -Latency 1 cycle • Cross-Correlation Example • 64 multiply, 64 add

Bit-Bit Multiplications D = D + b*bT Eg: Auto-Correlation b1*b2 Bit-Bit Multiplications 64-bit Register A = b1 64-bit Register B=b2 XNOR 64-bit Register C=b1*b2

b(1) b(2) b(7) b(8) 8-bit to 64-bit conversions D = D + b*bT Eg: Auto-Correlation 1.2 1.1 2.1 b1 = b(1:8),b(1:8),….b(1:8) b2 = b(1)b(1)……b(8)b(8) 8-bit Register b 64-bit Register A b(1)..b(8) b(1)..b(8) b(1) b(1) b(8) b(8)

Increment/Decrement D = D + b*bT Eg: Auto-Correlation 64-bit Register D 1 +/- +/- +/- 8-bit Register b1*b2 64-bit Register (D+b1*b2)

ALU Multipliers Truncated Multiplier Multiplier 1 Multiplier 2 Truncated Multipliers • Many applications need approximate computations • Adaptive Algorithms :Y = Y + mu*(Y*C) • Truncate lower bits • Truncated Multipliers - half the area/half the delay • Can do 2 truncated multiplies in parallel with regular

Open Questions • VLIW simulator?? • Showing performance improvement, for different algorithms • Compiler and software support

Conclusions • Data rates for advanced communication systems , limited by hardware, not by algorithms • Need to find efficient solutions to tackle this problem • - Hardware-software co-design • Presented my ways of attacking this problem

Future Work • RENÉ: • Single re-configurable hardware to switch between 2 communication standards • Designing algorithms, conditioned on the availability of only finite precision • http://www.ece.rice.edu/~sridhar/research.htm • http://cmc.rice.edu

Implementing algorithms for advanced communication systems -- My bag of tricks

Implementing algorithms for advanced communication systems -- My bag of tricks

Presentation Transcript

Advanced Algorithms

Implementing Communication-Avoiding Algorithms

Architectural Optimization of Decomposition Algorithms for Wireless Communication Systems

ADVANCED ALGORITHMS

Plex Systems’ Brown Bag – Advanced Budgeting 04.11.12

Advanced Algorithms

Advanced Algorithms

ADVANCED ALGORITHMS

The BREW Bag Of Tricks

Advanced Algorithms

A Bag of Tricks for Validation

ECE-523 – ADVANCED COMMUNICATION SYSTEMS

Advanced Algorithms

Advanced Algorithms

The BREW Bag Of Tricks

Advanced Algorithms

Advanced Algorithms

A Grab Bag of Tricks:

Advanced Algorithms

Advanced Algorithms

Advanced Algorithms