620 likes | 1.41k Views
DSP for FPGA. SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic. Objectives. Comparison between PDSP and FPGA Virtex II Pro Altera Stratix FPGA Stratix DSP Block and its configuration Altera design flow. What Is an FPGA?.
E N D
DSP for FPGA SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic
Objectives • Comparison between PDSP and FPGA • Virtex II Pro • Altera Stratix FPGA • Stratix DSP Block and its configuration • Altera design flow
What Is an FPGA? • Field Programmable Gate Array • Device that Has a Regular Architecture (Set of Blocks) that Can Be Programmed for Various Functions • “Glue” Logic • Customizable Hardware Solution • Configurable Processors
DSP System SoftwareDSP FPGA Why Use FPGAs in DSP Applications? • 10x More DSP Throughput Than DSP Processors • Parallel vs. Serial Architecture • Cost-Effective for Multi-Channel Applications • Flexible Hardware Implementation • Single-Chip Solution • System (Hardware/Software) Integration Benefits FPGA SoftwareEmbeddedProcessor
MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC DSP Processors vs. FPGAs High Speed DSP Processor High Level of Parallel Processing in FPGA • 1-8 Multipliers • Needs looping for more than 8 multiplications • Needs multiple clock cycles because of serial computation • 200 Tap FIR Filter would need 25+ clock cycles per sample with an 8 MAC unit processor • Can implement hundreds of MAC functions in an FPGA • Parallel implementation allows for faster throughput • 200 Tap FIR Filter would need 1 clock cycle per sample
Extending Range of Altera Reconfigurable DSP Solutions New! 600 - Performance (MMACs/sec) 100 - Embedded Processors Embedded Processors Hardware Acceleration Complete Hardware Implementation
Objectives • Comparison between PDSP and FPGA • Virtex II Pro • Altera Stratix FPGA • Stratix DSP Block and its configuration • Altera design flow
TriMatrix™ Memory [1] Dedicated External Memory Interface M512 Blocks M-RAM M4K Blocks • Packet / Data Storage • Nios Program Memory • System Cache • Video Frame Buffers • Echo Canceller Data Storage • Small FIFOs • Shift Register • Rake Receiver Correlator • FIR Filter Delay Line • Header / Cell Storage • Channelized Functions • ATM cell–packet processing • Nios Program Memory • Look-Up Schemes • Packet & Cell Buffering • Cache More Bits For Larger Memory Buffering 512 Kbits per block + parity 4 Kbits per block + parity 512 bits per block + parity More Data Ports for Greater Memory Bandwidth
D DATA Logic Element (LE) [2] LUT Chain Input Register Chain Input Register Control Signals addnsub cin (2) data1 4-Input LUT Sync Load & Clear Logic data2 Row, Column & DirectLink Routing data3 data4 Local Routing Register Feedback LUT Chain Output Register Chain Output • Note: • Functional Diagram Only. Please See Datasheet for more Details. • Addnsum & data1 connected via XOR logic
D DATA Dynamic Arithmetic Mode Register Chain Input Register Control Signals LAB Carry-In Carry-In Logic Carry-In0 Carry-In1 addnsub data1 Sum Calculator Sync Load & Clear Logic data2 Row, Column & DirectLink Routing data3 Carry Calculator Local Routing Carry-Out Logic Carry-In0 Carry-In1 Register Chain Output Carry-Out1 Carry-Out0 Note: Functional Diagram Only. Please See Datasheet for more Details.
LE1 LE2 LE3 LE4 LE5 LE6 LE7 LE8 LE9 LE10 Logic Array Blocks (LAB) [2] Control Signals • 10 LEs • Local Interconnect • LAB-Wide Control Signals 4 4 4 4 30 LAB Input Lines 10 LE Feedback Lines 4 Local Interconnect 4 4 4 4 4
Avalon Switch Fabric Contents • Avalon Switch Fabric provides the following to peripherals it connects • Data-Path Multiplexing • Address Decoding • Wait-State Generation • Dynamic Bus Sizing • Interrupt-Priority Assignment • Latent Transfer Capabilities • Streaming Read and Write Capabilities • Avalon Switch Fabric tailors transactions to the characteristic of peripherals that are attached
DMA Controller With Streaming Control Port (Slave) Read Port (Master – Streaming) Write Port (Master – Streaming) SOPC Design Example CPU 32 Bit Inst Master Data Master Avalon Switch Fabric Allows for Masters and Slaves to communicate without knowledge of each others interface details Instruction Memory 32-bit Data path Data Memory 32-bit Data path UART Avalon Tri-State Bridge VGA Controller External FLASH 1 MB 16-bit Datapath External SRAM 256 KB 32-bit Datapath
CPU 32 Bit Inst Master Data Master DMA Controller With Streaming Control Port (Slave) Read Port (Master – Streaming) Write Port (Master – Streaming) Data Path Multiplexing & Slave Arbitration • Data-Path Multiplexing Avalon Switch Fabric MUX 2- Slave Arbitration Arbiter Instruction Memory 32-bit Data path Data Memory 32-bit Data path UART Avalon Tri-State Bridge VGA Controller External FLASH 1 MB 16-bit Datapath External SRAM 256 KB 32-bit Datapath 3- Address Decoding
Objectives • Comparison between PDSP and FPGA • Virtex II Pro • Altera Stratix FPGA • Stratix DSP Block and its configuration • Altera design flow
Eight 9 × 9 bit multipliers Four 18 × 18 bit multipliers One 36 × 36 bit multiplier DSP Blocks
DSP Blocks (cont.) The DSP block consists of • A multiplier block • An adder/subtractor/accumulator block • A summation block • An output interface • Output registers • Routing and control signals
Input Register Unit Optional Pipelining + - S + - S + Output Multiplexer Output Register Unit Stratix DSP Blocks • High Performance Dedicated Multiplier Circuitry • 18x18 Functions at 280 MHz • Variable Operand Widths with Full Precision Outputs • 9x9 (8 Max.) • 18x18 (4 Max.) • 36x36 (1 Max.) • Add, Accumulate orSubtract • Signed & UnsignedOperations • Dynamically Changebetween Add & Subtract • Supports DSP RequirementsIncluding Complex Numbers
Resource Savings with DSP Blocks • DSP Block • Reduces LE Usage • Reduces Routing Congestion • Reduces Power • Maintains Performance 90% of your problems are hidden under the surface! 18 18 18 18 SAVES 652 ROUTING NETS! X X 36 36 36 36 + + + 38
Design Flow Overview • Create Design in Simulink Using Altera Libraries • Simulate in Simulink • Add SignalCompiler to Model • Create HDL Code & Generate Testbench • Perform RTL Simulation • Synthesize HDL Code & Place & Route • Program Device • Signal Tap II Logic Analyzer
Step 1- Create Design in Simulink Using Altera Libraries • Drag & Drop Library Blocks into Simulink Design & Parameterize Each Block
Step 3 - Add “Signal Compiler” to Model to Generate HDL code • APEX20K/E/C • APEX II • Stratix & Stratix GX • Cyclone & ACEX 1K • Mercury • FLEX10K & FLEX 6000 • DSP Boards • Leonardo Spectrum • Synplify • Quartus II Speed vs. Area Testbench Generation Message Window
Step 4 - Create HDL Code & Generate Testbench AltrFir32.mdl Enable "Generate Stimuli for VHDL Testbench" Button AltrFir32.vhd
DSP Builder Report File • Lists All Converted Blocks • Port Widths • Sampling Frequencies • Warnings & Messages
Step 5 – Perform RTL Simulation ( ModelSim ) • Set working directory (File => Change Directory) • Run TCL file (Tools => Execute Macro)
Perform Verification ModelSim vs Simulink
Step 6 - Synthesize HDL & Place & Route • Leonardo Spectrum • Synplify • Quartus II – Synthesis – Quartus II Fitter
Step 7 – Program Device Download Design to DSP Development Kits
Stratix DSP Development Board Nios Expansion Prototype Connector MAX 7000 Device Prototyping Area D/A Converters Mictor-Type Connectors for HP Logic Analyzers A/D Converters Analog SMA Connectors 40-Pin Connectors for Analog Devices Texas Instruments Connectors on Underside of Board
Stratix DSP Board – Key Features • Stratix EP1S25F780C5 Device (Starter Version) • Stratix EP1S80B956C7 Device (Professional Version) • Analog I/O • Two 12-bit, 125 MHz A/D Converters • Two 14-bit, 165 MHz D/A Converters • Digital I/O • Two 40-pin Connectors for Analog Devices A/D Converter Evaluation Boards • Connector for TI TMS320 Cross-Platform Daughter Card • 3.3V Expansion/Prototype Headers • RS-232 Serial Port • Memory • 2 Mbytes of 7.5-ns Synchronous SRAM • 32 Mbytes of FLASH
Step 8 - SignalTap II Logic Analyzer • Embedded Logic Analyzer • Downloads into Device with Design • Captures State of Internal Nodes • Uses JTAG for Communication
SignalTap II Logic Analyzer Analysis of Imported Data Imported Data Imported Plot