160 likes | 176 Views
Explore how cutting-edge FPGA tools like Viva and Celoxica DK are revolutionizing neural network and genetic algorithm implementations. Dive into real-world application examples and discover the benefits and drawbacks of these powerful tools.
E N D
Overview Application • Neural Network Implementation • Optimization • Genetic Algorithm • Particle Swarm • NASA/MSFC and UAH • Viva 2.4.1/3.0 • Starbridge Hypercomputer (HC-36) • Clint Patrick/EV23 Implementation Results Tool Pluses and Minuses • Optimization • GA more than an order of magnitude faster for TSP on one FPGA vs. SW solution • Very fast synthesis time reported • Neural Network and Particle Swarm Optimization • Work in progress • Fast learning curve compared to standard HDL • Recursive methods employ different paradigm • Better debugging support needed • Tool still a relatively new • DMA just coming online
Overview Application Reconfigurable Computing Group at Utah State University Tool Used: Viva System Used: Starbridge HC-62 Hypercomputer Presenter: Dr. Aravind Dasu • Two projects were developed: • A single instruction, multiple data soft core floating point processor • A 2-D Wavelet Transform Implementation Implementation Results Tool Pluses and Minuses • Pluses • Graphical drag-and-drop interface allows for quick circuit construction and easy hierarchical design • Polymorphic objects allow for run-time implementation using different data types • EDIF import/export adds compatibility with design developed using different EDAs. • Minuses • Virtually no documentation on what objects do, or on how to resolve compile-time errors • No simulation waveforms are available, making debugging difficult • Project 1: • When more than 5 data paths are present, the FPGA implementation will out perform a mid-grade laptop • Viva yields arithmetic units that consume about twice the number of slices as those obtained with VHDL • Project 2: • 32 way parallelism extracted yields performance comparable with high end desktop PCs
Overview Application Institute of Information and Automation (UTIA), Prague, CZ Tools: Celoxica DK 4 Handel-C; Mathworks Matlab 14.1; Synplicity Synplify Pro 8.1; Xilinx ISE 7.1; System: Microsoft W2K, XP Jiri Kadlec www.utia.cas.cz 1024 tap FIR filter, 376 MFLOP, 18-bit floating point <18m11>. One Master and four Workers (KCPSM3, PicoBlaze processors). DPBRAM connected accelerators. Designed & tested in Simulink with use of bitexact models of floating point modules exported from DK4. Implementation Results DK4 Pluses and Minuses ++ Generates scalable bit-exact models for Simulink & EDIF blocks from identical macro ++ HW debug at register level ++ Easier design of control paths ++ Easy learn&use, saves time - - Lacking more examples of interfaces, closely linked to board specific libraries 376 MFLOP, FIR, 18-bit FP scalable, 2 stage latency, 50 MHz. No I/O (only master RS232C line) Same packages (fg456). Power consumption: Virtex 2: xc2v1000-4 1020 mW Spartan 3: xc3s1000-4 234 mW Spartan 3L: xc2s1000L-4 190 mW
Overview Application Design of Long Range Identification Tag (RFID) in Celoxica’s DK. Formal Verification in Spin Model Checker. Reliability Estimation in CASRE. Heriot-Watt University, Edinburgh, UK Celoxica’s DK, Spin Model Checker SpartanIIE & 3, Coolrunner II Stefanos Skoulaxinos Implementation Results Tool Pluses and Minuses We found DK exceptionally valuable towards the development of the LRID Tag Application. It provides a tight control of time and space and avoids impractical abstraction encountered in other HLLCs. Unlike HDL it fabricates a more productive, readable and testable development route ideal for complex embedded applications. + Increased productivity, fast turn-arounds, highly readable and testable C based code, ideal for embedded software for FPGA applications, joins SW and HW development processes - System has not been tested under high radiation, TMR scheme should be supported by compiler in the future
Application Overview ECIT Institute - Queen’s University, Belfast Celoxica DK, Handel-C RC 1000 Prototyping board Dr. Abbes Amira Implementation Results Handel-C Pluses • Easy to learn- Accessible to software Engineers • Good simulator provided • Easy to visualize HW/SW partition • Rapid prototyping – Shortens design time by a factor 3-4 times • Ease of hybrid design/ Configuration Handel-C Minuses • Lower clock speeds? • VHDL gives finer control: eg. • delay balancing
Overview Application • Organization: Air Force Research Laboratory, • Advance Computer Architecture Branch • Tool Used: • Impulse Accelerated Technologies Co_Developer, Windows, Generic version. • Mentor Graphics ModelSim, Xilinx Edition- III, ISE 6.1i SP3, targeting XC2V6000 • System : PC Dell DIM4500, 2.26GHz P4,1GB • Presenters: Dan Burns or Ginger Ross • FPGA Core for Genetic Algorithm optimization engine • Fitness function evaluator for DNA Code Word Library Generation Problem • Speed-up spirals for languages on PC > Cluster > FPGA Implementation Results Tool Pluses and Minuses • Positives: • Good set of example applications. • Familiar look and feel. • Negatives: • No MPI support for distributed apps • No floating point support • Maturity/error diagnostics for newbie • How easy is the tool to learn and use? Very. • How user friendly? Very. • Converted to Impulse_C by adding streams & process constructs to C • Simulates OK in Application Monitor • No successful synthesis of VHDL yet • Hand crafted VHDL version systhesizes and clocks at 9.8ns, 500x speed-up over C in software on 2.26GHz P4
Overview Application Draper Laboratory Tool Used: ImpulseC System: Alpha-Data/Virtex II Presenter: John Ardini Complex fixed-point FFT (1024 pt) and FIR for use in HW/SW Runtime Partitioning Studies Implementation Results Tool Pluses and Minuses +ANSI-C +Very small fast curve (days) +Days, not months (HDL) to implement a design +C sim/testbench environment +$ +Some bus wrappers included Xilinx EDK +Processes/signals +Still maturing -Still maturing -No function calls -Only coarse control over number of pipeline stages
Roger Chamberlain Applications • Systems Used: • SGI Altix/RASC MOATB, and • Xilinx FPGA on PCI-X bus in Altix • Tools Used: • Mentor’s Modelsim • Synplicity’s synthesis • Xilinx’s place-and-route • Text search (both exact and approximate search) • Encryption/decryption • Structured record search • Biosequence search • Science data mining • Signature hashing Implementation Results Take Home Messages • Text search at > 800 MB/s • Encrypt (3DES): • RASC incompatible with our existing applications • Initial focus of SGI was signal processing apps. • SGI currently altering RASC to address our issues • HW/SW interface is critical • Port from Xeon to Opteron to Itanium was straightforward BOF-H-2
Application Overview George Mason University / The George Washington University SRC Carte SRC-6 Kris Gaj, Tarek El-Ghazawi, Allen Michalski, Miaoqing Huang Input-output intensive: IDEA cipher encryption Computationally intensive: IDEA cipher breaking Implementation Results Tool Pluses and Minuses + very easy to learn and use + standard ANSI C+ hides implementation details + good support for debugging + allows user libraries -subset of C - legacy C code requires rewriting - C limitations in describing HW Speed-up vs. Pentium IV 2.8 GHz Encryption without streaming 38 x(limited by I/O) Encryption with streaming 47 x(limited by I/O) Cipher breaking 556 x(limited by FPGA resources)
Overview Applications The George Washington University /George Mason University Tarek El-Ghazawi, Kris Gaj, Esam El-Araby, Mohamed Taher • SRC-6 • Tools • End-to-End Applications • SRC Carte • User Macros • SRC Carte, VHDL/Verilog, Simulink/SysGen • Remote Sensing • Discrete Wavelet Transform (DWT) • Wavelet-Based Hyperspectral Dimension Reduction • Image Registration • Cloud Detection • Bioinformatics • Smith-Waterman Implementation Results Tool Pluses and Minuses • SRC Carte • + HLL interface (5x less than HDLs) and no hardware background needed • + Optimization/Parallelization • I/O overlapping • Forwarding/Chaining • + Optimized Libraries • + User Macros • - Little more to do than HLLs (1.5x) • - Little architecture knowledge
Overview DARPA Benchmark Apps Duncan Buell U of South Carolina SRC 6, 6E, using Carte • Pattern matching • Data transposition • Sorting Implementation Results Tool Pluses and Minuses • 1. 40x speed increase (C code) • 2. 25x (C), 61x (Verilog) • 3. Minimal speedup Plus: This is programming Software simulation Minus: HLL doesn’t always provide control of the bits