450 likes | 710 Views
Recent Advances in Architectures and Tools for Complex FPGA-based Systems. Michael Schulte University of Wisconsin TWEPP 2009. Slides provided by various sources. Overview of FPGA Architectures Recent FPGA Trends Tools for FPGA-based Systems FPGAs in Particle Physics Experiments
E N D
Recent Advances in Architectures and Tools for Complex FPGA-based Systems Michael Schulte University of Wisconsin TWEPP 2009 Slides provided by various sources
Overview of FPGA Architectures • Recent FPGA Trends • Tools for FPGA-based Systems • FPGAs in Particle Physics Experiments • Conclusions Outline
FPGA Architectures High-performance Clocking Configurable Logic Blocks Embedded Memories Digital Signal Processing Slices Parallel I/O Ethernet Controllers Embedded Microprocessors Serial I/O Transceivers PCIeInterfaces FPGA Architecture • Modern FPGAs contain a wide variety of resources • These resources and their interconnections can be reconfigured for target applications Basic Architecture 3
Configurable Logic Blocks Configurable Logic Blocks Have reconfigurable hardware functionality Have reconfigurable interconnections Virtex-6 Configurable Logic Block Each CLB contains two slices Each slice contains four 6-input Lookup Tables (LUT6) Slices also contain fast carry-logic, multiplexers, and registers Each LUT6 can implement Wide range of Boolean functions Memories of 64 bits Shift registers Up to 118,560 slices per FPGA
Block RAM Up to 38 Mb of RAM in 36-Kb blocks Each 36-Kb block can be configured as two 18-Kb blocks Both the depth and the width of memory can be configured Dual-port memory Each port has synchronous read and write capability Different clocks for each port Several options for changing block RAM configuration 38 Kb block SelectRAM memory DIA DIPA ADDRA WEA ENA SSRA DOA CLKA DOPA DIB DIPB ADDRB WEB ENB SSRB DOB CLKB DOPB Basic Architecture 5
DSP Slices DSP slices provide fast arithmetic operations Virtex-6 DSP48E1 architecture 30-bit add/subtract unit 25x18-bit multiplier 48-bit arithmetic and logic unit (ALU) Add/substract/logic functions SIMD operation Final comparator Chaining and feedback increase functionality Up to 2K DSP Slices per FPGA Basic Architecture 6
Multi-Gigabit Transceiver Multi-Gigabit transceivers provide high-speed I/O GTX provides up to 6.5 Gbps per transceiver GTH provides up to 11 Gbps per transceiver XC6VHX565T has 24 GTH transceivers and 48 GTX transceivers Up to 576 Gbps of Serial I/O per FPGA FPGA Fabric Interface Tx PMA PCS Rx PMA PCS Transceiver PA
FPGA Trends – Logic Cells 595x Increase in Logic Cells IB
FPGA Trends – Block RAMs 270x Increase in Block RAM Bits IB
FPGA Trends – Clock Rate 24x Increase in Clock Rate IB
FPGA Trends – Power 5x Increase in Power IB
FPGA Trends – Energy/LC 54x Decrease in Energy/Logic Cell IB
Virtex-6 FPGA Characteristics Basic Architecture 13
Tools for FPGA-based Systems • FPGA-based systems are becoming increasingly complex • New tools can help manage this growing complexity • These tools come from a variety of sources • FPGA companies (Xilinx and Altera) • EDA companies (Synopsys, Cadence, Mentor Graphics) • Open source projects • University research projects • Several tools available to specify, design, test, and implement complex FPGA-based system • This talk focuses on • Tools for C-to-HDL conversion and synthesis • Xilinx DSP tools
C-to-HDL Converters • Several tools have been develop that perform C-to-HDL conversion • Impulse-C (www.impulseaccelerated.com) • C-to-Verilog (www.c-to-verilog.com) • PICO Extreme FPGA (www.synfora.com) • C-to-Hardware Acceleration Compiler (www.altera.com) • NISC Technology (www.ics.uci.edu/~nisc) • SPARK Toolset (mesl.ucsd.edu/spark) Requirements Specialized C Code Generated HDL Code C-to-HDL Tool
Impulse-C • Intended for hardware/ software co-design • Designs specified using C plus extensions • Impulse-C • Optimizes C code for parallelism • Generates hardware/ software interfaces • Generates VHDL or Verilog • Provides interactive tools to improve results
PICO Express FPGA • Takes a C algorithm and a set of design requirements and automatically creates a series of implementation models • Also creates testbenches, synthesis and verification scripts, and software drivers
C-to-Verilog.Com • Provides free online tool for C-to-Verilog conversion • Converts standard C code with some restrictions • No recursive functions, structures, pointers to functions, or library function calls • Separately specify implementation details • Word sizes, amount of loop unrolling, amount of hardware, FPGA family • Automatically generates Verilog code and a Verilog testbench (with random inputs) • Performance of generated code can be quite good • Is difficult to read and debug • Several design examples available online
NISC Technology • Provides an automated method for generating custom processors for a target application • Generates the processor datapath, control bits, and data memory • These are then mapped to an ASIC or FPGA using conventional tools
Summary of C-to-HDL Tools • Simplify the design process • Especially for people not familiar with HDLs • Often generate useful resource – C support code, test benches, hardware interfaces, scripts, etc. • Facilitate design-space exploration • Can generate surprisingly good designs • As good or better than hand-optimized designs in many cases • However, • C code has restrictions and/or needs extensions • Need to be careful with coding style • Often designed for hardware acceleration • Generated code is usually difficult to read and debug • Tools should continue to improve
Xilinx DSP Tools • Xilinx tools for digital signal processing include • System Generator for DSP • AccelDSP • Integrated with other Xilinx Tools, MATLAB, and Simulink • Facilitate exploration and implementation of DSP systems • Useful in other data-intensive applications
MATLAB • MATLAB provides an environment to perform mathematical modeling of complex systems • Extensive libraries for math functions, signal processing, scientific computing, etc. • Large variety of functions to plot and visualize data, systems, and designs
Simulink • Simulink provides a model-based design environment • Fully integrated with MATLAB • Graphical block editor • Event-driven simulator • Models parallelism • Extensive library of parameterizable functions
Xilinx System Generator • Provides a system-level design environment for Xilinx FPGAs • Enables a design flow from Simulink software to FPGA implementation. It leverages • MATLAB , Simulink • HDL synthesis, IP Core libraries • FPGA implementation tools • Features include • Synthesizable VHDL/Verilog with hierarchy preservation • Automatic invocation of the CORE Generator software • Project generation to simplify the design flow • HDL testbench and test vector generation • Constraint file and simulation file generation
Basic SysGen Design Flow Enables designs specified using Simulink to be quickly implemented on FPGAs
System Generator DSP Blockset and the Simulink Library • Provide a variety of arithmetic, logic operators, and DSP functions • Bit and cycle-accurate to FPGA implementation • Arbitrary precision fixed-point arithmetic including quantization and overflow • Allow instantiation of controllers, memories, and IP blocks • Facilitate input waveforms and output value sampling
Design Integration with SysGen • System Generator is integrated with the rest of the Xilinx Tool Flow
AccelDSP Design Flow Floating-Pt. Algorithm Fixed-Point Conversion Architecture Definition Create / Integrate IP Blocks Create RTL Design Refine Architecture Verify RTL RTL Synthesis • Replaces manual steps • Floating to fixed-point conversion • RTL creation • RTL and gate verification back to the original algorithm • IP creation and integration Typical Design Flow Steps performed by AccelDSP Floating-Pt. Algorithm AccelDSP / AccelWare RTL Synthesis AccelDSP Design Flow
AccelDSP Design • AccelDSP Synthesis • MATLAB to RTL or Simulink • Design Inferred from MATLAB code • Quantization changes • Loop rolling / unrolling • Pipelining • Device-specific memory mapping • AccelWare IP Generators for DSP Cores • Signal processing • Linear algebra • Communications AccelDSP - MATLAB to RTL DSP Synthesis for Creation of DSP Cores AccelWare - Parameterized DSP IP Generator Toolkits
AccelDSP Design & Verification • AccelDSP Synthesis automatically generates a fixed-point model from the floating-point source • Process is user interactive and controllable • High level MATLAB functions and operators are automatically replaced by hardware accurate models • Accurate estimates are made of resources and performance • IP-Explorer • Fixed-point analysis features are provided to address reduced-precision arithmetic errors • Plots and histograms • Overflow and underflow reporting AccelDSP - MATLAB to RTL DSP Synthesis for Creation of DSP Cores
Summary of Xilinx DSP Tools • Provide a high-level of abstraction for design • Developed using MATLAB and/or Simulink • Facilitate design-space exploration • Can quickly examine different architectures and get resource and performance estimates • Generate good designs in terms of both resource usage and performance • Especially for DSP-related tasks, but also for other data-intensive systems • Well integrated with other Xilinx tools • However, • MATLAB code has restriction and Simulink has limited capabilities • Need to be careful with coding style • Generated code can difficult to read • Portions of the generated code are Xilinx specific
FPGA-based Systems for Particle Physics • Recent FPGA architectures and tools have tremendous potential for particle physics systems • Over 10,000 FPGAs currently used in CMS electronics • Can provide improved • System integration • Design space exploration • Design verification • Overall design quality • Examples • Configurable feedback damper systems for the Spallation Neutron Source (SNS) at ORNL • Initial designs for upgrades to the Regional Calorimeter Trigger (RCT) for the CMS detector at CERN
Feedback Damper Systems • A mixed-signal feedback damper system to control Electron-Proton (E-P) instabilities • Helps control ring oscillations due to E-P instabilities • Compensates for various source of error in the system • Includes a mixed of analog and FPGA-based hardware • Has very high-speed signal processing requirements
Feedback Damper Systems • Preliminary FPGA design • Includes programmable delays, comb filters, parallel FIR filters, and gain control • Allows the system to be monitored and adapted at runtime • Implemented using VHDL and conventional Xilinx tools • Later design using System Generator, AccelDSP, and ChipScope Pro had • Reduced resource utilization (roughly 25% reduction) • Improved performance (increase clock frequency by 20%) • Greater opportunities for design space exploration • Degree of parallelism, number of taps in FIR filter • Changes to arithmetic precision and rounding • Improved debug capabilities
Regional Calorimeter Trigger • Investigating FPGA architectures, tools, and techniques to help upgrade the RCT of the CMS detector at CERN • Original RCT implemented using ASICs • Upgraded RCT to be implemented using FPGAs with new algorithms for increased beam luminosity • Desired features • Greatly reduced component counts • Ability to quickly explore design alternatives • Rigorous testing and verification infrastructure
Regional Calorimeter Trigger • Investigating several tools and techniques including • The DICE environment for testing and verification • Dataflow languages (DIF and OpenDF) for design specification, validation, and implementation • Several Xilinx tools for design specification, implementation, debugging, and verification • Open source tools for version control (SVN) and firmware documentation (doxygen and doxverilog) • Modular and parameterized designs to facilitate design space exploration and algorithm changes • Considerable work is still needed, but initial results are promising
Conclusions • Recent FPGAs provide tremendous processing and communication resources • Future FPGAs will provide even more • Recent tools help manage the design, test, and integration of complex FPGA-based systems • Additional tools are needed to cope with increasing design and verification complexity • Recent FPGAs and tools can provide substantial benefits to experiments for high-energy physics • Further investigation and collaboration is needed to realize their full potential
Floating-Point to Fixed-Point Conversion and Verification AccelProbe • AccelDSP Synthesis automatically generates a fixed-point model from the floating-point source • Process is user interactive and controllable • High level MATLAB functions and operators are automatically replaced by hardware accurate models • Accurate estimates are made of resources and performance • IP-Explorer • Fixed-point analysis features are provided to address reduced-precision arithmetic errors • Plots and histograms • Overflow and underflow reporting
ISE Foundation • Enter, simulate, synthesize and implement designs • PlanAhead Design and Analysis Tool • Perform I/O pin planning, design analysis, floorplanning, and design-space exploration • ChipScope Pro • Debug and validation post-implementation designs • Embedded Development Kit (EDK) • Generate complete embedded processor systems • System Generator for DSP • Design and analyze DSP systems • See http://www.xilinx.com/tools/system.htm Xilinx FPGA Tools
FPGA Design Flow Create Code/ Schematic HD Simulation Plan & Budget Implement Functional Simulation Synthesize to create netlist Translate Map Place & Route Timing Simulation GenerateBIT File Attain Timing Closure ConfigureFPGA This tutorial focuses on implementation
FPGA Trends 1000 Memory Bandwidth Virtex-II Pro 100 Capacity Speed Price Virtex-II Pro 10x 10 Spartan-3 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 1/91 1/92 1/93 1/94 1/95 1/96 1/97 1/98 1/99 1/00 1/01 1/02 1/03 1/04 1x 1 Price: 300x Speed: 20x Capacity: 200x Bandwidth: 700x Year