Sub- Nyquist Sampling DSP & SCD Modules

Winter 2010 High Speed Digital Systems lab Electrical Engineering faculty Technion– Israeli institute of technology Sub-Nyquist SamplingDSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali

Outline • Overview – Goals and discussion • Algorithm review • Implementation in hardware • Changes for Adaptation to hardware • Evaluation • Possible Optimization & Future Work

Overview • The Goal system • The module’s Objectives • Interface SUPPORT & Matrix DSP (Baseband) DELAY FIFO DSP & SUPPORT CHANGE DETECTOR A matrix vector 432 bits Support Anlysis vector 101 bits Valid samples 1 bit CTF (Support recovery) Support Changed 1 bit Analog Back-end (Realtime) DSP (Baseband) First Beta (For QR decomposition) 36 bits Expand 1:q Memory A Matrix Address 9 bits Detector Samples Bundle 432 bits Valid Supports 1 bit

Algorithm Review • Pseudo-Inverse • Matrix Decomposition • Matrix Inversion • Matrix Multiplication • Support Change Detection • Support threshold evaluation attempt Pseudo inverse Real Time Vector Multiplier Support Change Detector

Algorithm Review – Pseudo Inverse • Matrix Decomposition • QR Decomposition • Using Householder Reflections

Algorithm Review – Pseudo Inverse • Matrix Inversion – Gaussian Elimination • Matrix Multiplication Matrix Multiplier’s Common Interface Matrix Multiplier Vector Multiplier

Algorithm Review - SCD • The support change detector is a vector multiplier – given one row of the pseudo inversed A matrix and multiply it by the signal to see if any energy there is not noise. • Threshold generation attempt: • If there was no support change • If we replace W with the average: • The generated value doesn't show any false alarms. But may have misdetection on several cases where the SNR is low. *Eventually The Threshold was defined as an input by the user. Our estimated guess for threshold is 000001000110010100 (for the AM demo) ~0.3

DSP & SCD system operation Upper triangular matrix inverse A Matrix RAM QR Decomposition R Support indexes A_s CTF Auxiliary multiplications Reflections creation Reflection multiplication Q’ Matrix multiplier R inversed A dagger Ping-Pong Buffer (RAM) Delay FIFO Samples From Expand Expand Real Time Matrix-Samples Multiplier Reconstructed Signal Control Vector Support Change Detector '1'

Implementation In Hardware Block (Entities) Definition – Pseudo Inverse QR Decomposition Inverting an upper triangular matrix QR Decomposition Matrix Multiplier Matrix Inversion Matrix Multiplier

Implementation In Hardware Phase 2 • Block (Entities) Definition – Pseudo Inverse • QR Decomposition Phase 1 Aux 2 Beta calculation unit 24 Multipliers

Implementation In Hardware • Block (Entities) Definition – Pseudo Inverse Matrix Inversion Unit Vector Inversion Unit Vector Inverter FIFO for Original R Matrix

Implementation In Hardware Real Time Mult Matrix Multiplier RAM SCD Matrix Multiplier

Outline - Adaptation to Hardware • Overview – Goals and discussion • Algorithm review • Implementation in hardware • Adaptation to hardware • Complex Enhance • Normalizing the Input • Resolution (Overflow) discussion • SCD – running average • Timing issues • Evaluation • Possible Optimization & Future Work

Complex Enhance • To avoid all complex multiplications we changed the structures of the matrix. • The matrix is 4 times bigger. For every complex vector multiplication we can still multiply 1 vector with another vector the ordinary way, and get the correct results.

Normalizing the Input • Accuracy falls with smaller mantissa • Matrices can be normalized pre inverse and post inverse • Hence: • Motivation • The real data differed from the synthetic data given – thus 18 bits are not enough (we need to represent both the number and 1 divided by the number). • Normalizing the matrix allows us to play with the fraction to minimize error and underflow.

Support Change Detection – with running average Threshold REG8 REG5 REG3 REG4 REG1 REG7 REG6 REG2 Control vector RAM > Detection Samples + Vector multiplier Cycle counter MUX

Timing • Deep pipeline • We incorporated a deeper pipeline to make the module work on the high desired frequency. The Quartus currently shows that the module may perform only up to the given frequency. It is possible to rise it by raising the pipe levels inthe bottlenecks found in the design. • Clocks • Main clock – 20 MHz  may rise to 70MHz • Working clock for pseudo inverse – 100 MHz – currently non flexible • Hardware reuse • The matrix multiplier and the inverse unit use a single unit for a vector size for many iterations – hence they make the bottlenecks.

Bottlenecks in the design • Matrix Inverse • Matrix Multiplier • Beta calculation in the QR – heavy arithmetic actions taking place. • If we replace the arithmetic units within these entities with higher pipeline units (the division is 23 cycles, the square root is 11 cycles and the multiplier is 2) – the maximal frequency will rise. • No real reason to activate with a higher clock except when memory on the chip is lacking for the delay FIFO or speed being an actual necessity.

Resource Consumption • Total numbers taken from Stratix III FPGA EP3SE260F1152C2

DSP – Runtime Analysis • Worse case pseudo inverse timing (for 11 support vectors) is a delay of 0.5 milliseconds. Hence an appropriate delay FIFO is required. • The SCD and reconstruction multiplier works in real time (1 cycle 50 ns).

Outline • Overview – Goals and discussion • Algorithm review • Implementation in hardware • Changes for Adaptation to hardware • Evaluation • Testing method • Results • discussion • Conclusions • Possible Optimization & Future Work

Evaluation - Testing Output text files • Logical Testing • Expanded samples • VHDL – Test bench Input text files Matlab (fixed point) = VHDL • A matrix memory • Functional module • CTF output support • DSP • SCD • Status parser

Evaluation - Testing Output text files Input text files Analysis & Comparison to Modelsim

Evaluation - Results • Results of the run on FPGA with the following signals • Fm259_252_sin824_809 • Fm259_252_am872.697 • Am_872.697_sin824 • SCD test

Evaluation - Results Matlab simulation FPGA output

Evaluation - Results Support Change experiment Support changed

Evaluation - Discussion • Inspection of correctness were done in comparison to Matlab under the following: • Maximal MSE of the calculated pseudo inversed matrix values • Maximal and averaged values of the difference between the results of the matlab simulation and the actual results • By looking and inspecting differences…. • The SCD experiment was composed of two uneven support samples bundles put together to inspect correctness and conclude further about the support threshold.

Evaluation – conclusions • The MSE inspected for the inversed matrix is 10^-3 • The MSE for the reconstructed signal: • Maximal 0.04 • Averaged ~10^-6 • No actual conclusions were made about the support changes in function – the predictable behavior of the function is only in the support changes.

Future Work • Possible Optimizations • Modification to the inversion algorithm for higher parallelism. • Scaling hardware to increase performance. • Possibly changing the resolution of the calculations to 22 or more bits for more accurate resolution - great cost in hardware. • Integration

Summary • We have managed to activate the DSP and SCD module on FPGA and got sufficient results. • We introduced an algorithm for calculating the support threshold. • We changed most architecture to support pipeline and use minimal hardware – vector resolution. • Changed debug environment to support a different FPGA.

Sub- Nyquist Sampling DSP & SCD Modules