1 / 29

Adaptive beamforming using QR in FPGA

Adaptive beamforming using QR in FPGA. Richard Walke, Real-time System Lab Advanced Processing Centre S&E Division. Contents. 1 Architecture of adaptive beamformer 2 FPGA components Digital receiver QR processor – for adaptive weight calculation 3 Design methodology

Download Presentation

Adaptive beamforming using QR in FPGA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive beamforming using QR in FPGA Richard Walke, Real-time System Lab Advanced Processing Centre S&E Division

  2. Contents 1 Architecture of adaptive beamformer 2 FPGA components • Digital receiver • QR processor – for adaptive weight calculation 3 Design methodology 4 Demonstration overview 5 Conclusions

  3. Architecture of adaptive beamformer Section 1

  4. Architecture of adaptive beamformer Adaptive beamformer Rx DRx W1 Rx DRx Beamformed signal + W2 Rx DRx Antenna Array Analogue receivers Digital receivers Spatial filter Wn Adaptive Weight Calculation

  5. FPGA components Section 2

  6. Radar receiver specification Runtime configuration parameters Image reject Image reject Bandwidth control Bandwidth control Complex equaliser  Array of programmable ‘tap-processors’ 2 Image reject Image reject Bandwidth control Bandwidth control FPGA Components Software configurable FIR

  7. FPGA Components Software configurable FIR • Software programmable parameters include: • filter length • decimation ratio • complex/real arithmetic • number of channels • time varying filtering (inter & intra-pulse) • Performance 20-30 GOPS on XC2V6000-5 • 100 GOPS on Virtex2 Pro (2003)

  8. FPGA Components Weight calculation using QR • Building block for a range of adaptive algorithms • Sample matrix inversion (SMI) • Soft constraints • ESMI • STAP Beamforming weights Antenna data Constraints Pre-processor Post-processing RISC/DSP QR decomposition FPGA

  9. FPGA Components QR decomposition Vectorise cell:11 real floating-point operations Rotate cell:16 real floating-pointoperations Input data x1 x2 x3 x4 y (r,x) r1,1 r1,2 r1,3 r1,4 u1  r2,2 r2,3 r2,4 u2 (r’,0)  r3,3 r3,4 u3 (u’,y’) (u,y) r r4,4 u4 cos  = r2+x2 u’ u x cos  sin  sin  = = y’ -sin  cos  y r2+x2 For SMI: Rw=u

  10. FPGA Components Features of QR • Good numerical properties. Arithmetic choices: • CORDIC: shift-add • Fixed-point: multiply-add • Floating-point: Higher dynamic range, allows algorithms with fewer operations & lower wordlength. Smallest! • Highly parallel (Givens rotations) • Suits FPGA • Need to reduce parallelism for many applications!

  11. 1. Mixed mapping 1,1 1,2 1,3 1,4 1,5 1,6 100% 67% utilisation 2,3 2,4 2,5 2,6 83% 2,2 3,3 3,4 3,5 3,6 67% 2. Discrete mapping 4,4 4,5 4,6 50% 20% 40% 5,5 5,6 33% 60% 80% 67% utilisation 100% 100% FPGA Components Obtaining lower-levels of parallelism

  12. Linear systolic array FPGA Components Novel mapping of QR

  13. FPGA Components FPGA implementation XCV3200E-8 11 FP FP vectorise FP divider 8xFP multipliers add add processor 2 x Complex Complex rotate complex 16 16 multiplier multiplier processor adder rotate rotate 16 16 processor processor rotate rotate 16 16 processor processor rotate rotate 16 16 processor processor Number of Ops 139 14-bit FP operators @ 160MHz = 22 GFLOPS

  14. FPGA Components QR processor - main features Size Mantissa wordlength Clock Utilisation (XC2V6000) Operations Power 1 Boundary 3 Internal 14-bit mantissa 101MHz5 Mults 32 22% 23% 6 GFLOPS 2.24W4 Rams3 34 23% LUTS 15K 22% FFs 16K 23% 1 Boundary 12 Internal 14-bit mantissa 100MHz2 82%2 20.3 GFLOPS 8W6 1 Boundary 9 Internal 17-bit mantissa 97MHz 74% 15 GFLOPS 7W6 PentiumTM 4 2GHz1 4 GFLOPS 70W x50 1 Estimated (based on data from Richard Linderman) 2 Estimated (design too large for PC) 3 Also depends upon number of inputs 4 Obtained via Xpower 5 For XC2V6000-5 6 Extrapolated

  15. Heterogeneous design methodology Section 3

  16. Graphically specify system Heterogeneous design methodology GEDAETM

  17. Heterogeneous design methodology GEDAETM • Graphically specify system • primitive functions in ‘c’

  18. Heterogeneous design methodology GEDAETM • Graphically specify system • primitive functions in ‘c’ • executable specification

  19. Heterogeneous design methodology GEDAETM • Graphically specify system • primitive functions in ‘c’ • executable specification • Auto-code generation • parallel programme constructed by GEDAE

  20. Heterogeneous design methodology GEDAETM • Graphically specify system • primitive functions in ‘c’ • executable specification • Auto-code generation • parallel programme constructed by GEDAE • Currently no support for FPGA • highly compatible model

  21. Heterogeneous design methodology Core based methodology • Cores used for key functions • FFT, QR, FIR filter ... • Build in parallelism (manually) • Parameterised Processor

  22. Heterogeneous design methodology Core based methodology • Cores used for key functions • FFT, QR, FIR filter ... • Build in parallelism (manually) • Parameterised • Automatically generated system • communications inserted Processor FFT core from library FPGA Processor

  23. Heterogeneous design methodology Core based methodology • Cores used for key functions • FFT, QR, FIR filter ... • Build in parallelism (manually) • Parameterised • Automatically generated system • communications inserted • Architectural exploration • Compaan gives Matlab NLP to VHDL • RTL output in future version Processor FFT core from library FPGA Processor

  24. Adaptive beamformer demonstration overview Section 4

  25. Host PC (laptop) Verification and display QR Demonstration overview System mapping Host PC (laptop) PowerPC (DNA VQG4 VME) Delay Environment synthesis and system configuration Sample Back-substitution Virtual Channel API PCI IF Virtual channel IF Chan0 Chan1 Chan2 Chan3 DRx FPGA (TransTech PMC)

  26. Conclusion • FPGAs • Performance dependent upon level of optimisation • Floating-point is realistic • 10x compute improvement • 5 - 20x power improvement • Design is main issue • Hardware design: High levels of parallelism required • Core-based design approaches offer interim solution • Architectural synthesis tools are emerging

  27. Acknowledgements • The project to develop a core-based design methodology is a collaboration between: • QinetiQ Ltd • poc: rlwalke@qinetiq.com • BAE SYSTEMS ATC, Gt Baddow • poc: ian.alston@baesystems.com • Contributions have been made by John McAllister under contract with the Queen’s University of Belfast. • This work was sponsored by the United Kingdom Ministry of Defence Corporate Research Programme.

More Related