1 / 16

Cutting-Edge FPGA Tools for Swift Optimization

Explore how cutting-edge FPGA tools like Viva and Celoxica DK are revolutionizing neural network and genetic algorithm implementations. Dive into real-world application examples and discover the benefits and drawbacks of these powerful tools.

davidhlewis
Download Presentation

Cutting-Edge FPGA Tools for Swift Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview Application • Neural Network Implementation • Optimization • Genetic Algorithm • Particle Swarm • NASA/MSFC and UAH • Viva 2.4.1/3.0 • Starbridge Hypercomputer (HC-36) • Clint Patrick/EV23 Implementation Results Tool Pluses and Minuses • Optimization • GA more than an order of magnitude faster for TSP on one FPGA vs. SW solution • Very fast synthesis time reported • Neural Network and Particle Swarm Optimization • Work in progress • Fast learning curve compared to standard HDL • Recursive methods employ different paradigm • Better debugging support needed • Tool still a relatively new • DMA just coming online

  2. Overview Application Reconfigurable Computing Group at Utah State University Tool Used: Viva System Used: Starbridge HC-62 Hypercomputer Presenter: Dr. Aravind Dasu • Two projects were developed: • A single instruction, multiple data soft core floating point processor • A 2-D Wavelet Transform Implementation Implementation Results Tool Pluses and Minuses • Pluses • Graphical drag-and-drop interface allows for quick circuit construction and easy hierarchical design • Polymorphic objects allow for run-time implementation using different data types • EDIF import/export adds compatibility with design developed using different EDAs. • Minuses • Virtually no documentation on what objects do, or on how to resolve compile-time errors • No simulation waveforms are available, making debugging difficult • Project 1: • When more than 5 data paths are present, the FPGA implementation will out perform a mid-grade laptop • Viva yields arithmetic units that consume about twice the number of slices as those obtained with VHDL • Project 2: • 32 way parallelism extracted yields performance comparable with high end desktop PCs

  3. Questions?

  4. Overview Application Institute of Information and Automation (UTIA), Prague, CZ Tools: Celoxica DK 4 Handel-C; Mathworks Matlab 14.1; Synplicity Synplify Pro 8.1; Xilinx ISE 7.1; System: Microsoft W2K, XP Jiri Kadlec www.utia.cas.cz 1024 tap FIR filter, 376 MFLOP, 18-bit floating point <18m11>. One Master and four Workers (KCPSM3, PicoBlaze processors). DPBRAM connected accelerators. Designed & tested in Simulink with use of bitexact models of floating point modules exported from DK4. Implementation Results DK4 Pluses and Minuses ++ Generates scalable bit-exact models for Simulink & EDIF blocks from identical macro ++ HW debug at register level ++ Easier design of control paths ++ Easy learn&use, saves time - - Lacking more examples of interfaces, closely linked to board specific libraries 376 MFLOP, FIR, 18-bit FP scalable, 2 stage latency, 50 MHz. No I/O (only master RS232C line) Same packages (fg456). Power consumption: Virtex 2: xc2v1000-4 1020 mW Spartan 3: xc3s1000-4 234 mW Spartan 3L: xc2s1000L-4 190 mW

  5. Overview Application Design of Long Range Identification Tag (RFID) in Celoxica’s DK. Formal Verification in Spin Model Checker. Reliability Estimation in CASRE. Heriot-Watt University, Edinburgh, UK Celoxica’s DK, Spin Model Checker SpartanIIE & 3, Coolrunner II Stefanos Skoulaxinos Implementation Results Tool Pluses and Minuses We found DK exceptionally valuable towards the development of the LRID Tag Application. It provides a tight control of time and space and avoids impractical abstraction encountered in other HLLCs. Unlike HDL it fabricates a more productive, readable and testable development route ideal for complex embedded applications. + Increased productivity, fast turn-arounds, highly readable and testable C based code, ideal for embedded software for FPGA applications, joins SW and HW development processes - System has not been tested under high radiation, TMR scheme should be supported by compiler in the future

  6. Application Overview ECIT Institute - Queen’s University, Belfast Celoxica DK, Handel-C RC 1000 Prototyping board Dr. Abbes Amira Implementation Results Handel-C Pluses • Easy to learn- Accessible to software Engineers • Good simulator provided • Easy to visualize HW/SW partition • Rapid prototyping – Shortens design time by a factor 3-4 times • Ease of hybrid design/ Configuration Handel-C Minuses • Lower clock speeds? • VHDL gives finer control: eg. • delay balancing

  7. Questions?

  8. Overview Application • Organization: Air Force Research Laboratory, • Advance Computer Architecture Branch • Tool Used: • Impulse Accelerated Technologies Co_Developer, Windows, Generic version. • Mentor Graphics ModelSim, Xilinx Edition- III, ISE 6.1i SP3, targeting XC2V6000 • System : PC Dell DIM4500, 2.26GHz P4,1GB • Presenters: Dan Burns or Ginger Ross • FPGA Core for Genetic Algorithm optimization engine • Fitness function evaluator for DNA Code Word Library Generation Problem • Speed-up spirals for languages on PC > Cluster > FPGA Implementation Results Tool Pluses and Minuses • Positives: • Good set of example applications. • Familiar look and feel. • Negatives: • No MPI support for distributed apps • No floating point support • Maturity/error diagnostics for newbie • How easy is the tool to learn and use? Very. • How user friendly? Very. • Converted to Impulse_C by adding streams & process constructs to C • Simulates OK in Application Monitor • No successful synthesis of VHDL yet • Hand crafted VHDL version systhesizes and clocks at 9.8ns, 500x speed-up over C in software on 2.26GHz P4

  9. Overview Application Draper Laboratory Tool Used: ImpulseC System: Alpha-Data/Virtex II Presenter: John Ardini Complex fixed-point FFT (1024 pt) and FIR for use in HW/SW Runtime Partitioning Studies Implementation Results Tool Pluses and Minuses +ANSI-C +Very small fast curve (days) +Days, not months (HDL) to implement a design +C sim/testbench environment +$ +Some bus wrappers included  Xilinx EDK +Processes/signals +Still maturing -Still maturing -No function calls -Only coarse control over number of pipeline stages

  10. Questions?

  11. Roger Chamberlain Applications • Systems Used: • SGI Altix/RASC MOATB, and • Xilinx FPGA on PCI-X bus in Altix • Tools Used: • Mentor’s Modelsim • Synplicity’s synthesis • Xilinx’s place-and-route • Text search (both exact and approximate search) • Encryption/decryption • Structured record search • Biosequence search • Science data mining • Signature hashing Implementation Results Take Home Messages • Text search at > 800 MB/s • Encrypt (3DES): • RASC incompatible with our existing applications • Initial focus of SGI was signal processing apps. • SGI currently altering RASC to address our issues • HW/SW interface is critical • Port from Xeon to Opteron to Itanium was straightforward BOF-H-2

  12. Questions?

  13. Application Overview George Mason University / The George Washington University SRC Carte SRC-6 Kris Gaj, Tarek El-Ghazawi, Allen Michalski, Miaoqing Huang Input-output intensive: IDEA cipher encryption Computationally intensive: IDEA cipher breaking Implementation Results Tool Pluses and Minuses + very easy to learn and use + standard ANSI C+ hides implementation details + good support for debugging + allows user libraries -subset of C - legacy C code requires rewriting - C limitations in describing HW Speed-up vs. Pentium IV 2.8 GHz Encryption without streaming 38 x(limited by I/O) Encryption with streaming 47 x(limited by I/O) Cipher breaking 556 x(limited by FPGA resources)

  14. Overview Applications The George Washington University /George Mason University Tarek El-Ghazawi, Kris Gaj, Esam El-Araby, Mohamed Taher • SRC-6 • Tools • End-to-End Applications • SRC Carte • User Macros • SRC Carte, VHDL/Verilog, Simulink/SysGen • Remote Sensing • Discrete Wavelet Transform (DWT) • Wavelet-Based Hyperspectral Dimension Reduction • Image Registration • Cloud Detection • Bioinformatics • Smith-Waterman Implementation Results Tool Pluses and Minuses • SRC Carte • + HLL interface (5x less than HDLs) and no hardware background needed • + Optimization/Parallelization • I/O overlapping • Forwarding/Chaining • + Optimized Libraries • + User Macros • - Little more to do than HLLs (1.5x) • - Little architecture knowledge

  15. Overview DARPA Benchmark Apps Duncan Buell U of South Carolina SRC 6, 6E, using Carte • Pattern matching • Data transposition • Sorting Implementation Results Tool Pluses and Minuses • 1. 40x speed increase (C code) • 2. 25x (C), 61x (Verilog) • 3. Minimal speedup Plus: This is programming Software simulation Minus: HLL doesn’t always provide control of the bits

  16. Questions?

More Related