260 likes | 287 Views
Existing DSP Solutions. Fixed function DSP devices ASICs Standard DSP processors (only programmable solution). But what do you do when ... … the fastest DSP Processor Is Not Fast Enough?. Add more DSP processors? Design a custom gate array?. Performance Through Parallel Processing.
E N D
Existing DSP Solutions • Fixed function DSP devices • ASICs • Standard DSP processors (only programmable solution) But what do you do when ...… the fastest DSP Processor Is Not Fast Enough? • Add more DSP processors? • Design a custom gate array?
Performance Through Parallel Processing Xilinx FPGA DSP Processor MAC MAC MAC MAC MAC RAM Peripherals CPU& MAC(s) MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC Peripherals MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC RAM ROM ROM MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC Time-share 1 or 2 or 4 MACs As many MACs in parallel as you need
Xilinx DSP Advantages 10X the Performance 1/5th the Cost 5 16-bit FIR Filter Benchmark $0.25 4 $0.20 3 GIGA-MACs $0.15 2 $0.10 1 $0.05 320C6x XC4000XL S30 S40 4036 4062 4085 320C6x 40125 PLUS Faster Time-to-Market
Xilinx DSP: Complete High-Performance Programmable Solution • XilinxFPGAs - Spartan, 4K, Virtex • Design tools and DSP IP • LogiCORE & AllianceCORE • CORE Generatorsoftware • Reference designs on PreLINX internal web • Elanix - SystemView - integration • DSP Prototyping boards • DSPStarter Kit • DSP Support • Ph.D. Eng • DSP FAEs • Design services DSP Functions Modeling Tools
2D Discrete Cosine Transform Applications Deliverables Design Source Behavioral Model Instantiation Code Test Bench Reference Design Core Source Device Family ------------- Size --------------------------- Performance --------------- Smart-IP Technology Reference Design Corner Turn Buffer 64x12x2 Distributed RAM Example PerLINX Reference Design Valid (Data on output) Busy (Don’t write data into core) Clock Load Data 1D - DCT Engine 1D - DCT Engine 16-bits 12-bits Data Out 64 Pixels Input Data 64, 12-bit Pixels 8 x 8 2D, DCT • Features • Under 2 usec. continuous transform time • 8 x 8 points, 2D DCT • 12-bit 2’s complement data in • 16 bit resolution internal coefficients • Efficient one bit bit clock distributed arithmetic algorithm • Distributed RAM corner turning buffer • Non-parameterizable Core • Sign extend for less than • 12 bit inputs • Video image compression • JPEG, MPEG building block • Video conferencing Netlist (with R-Loc’s) No VHDL and Verilog Test vectors available Data sheet PreLINX (Turney) XC4000 1200 CLBs including buffers 75 MHz, -08
DSP Market Opportunity $10B 1998 Total DSP ICs $.4 DSP Processors $3.8B FASICS $5.8B Building Blocks, Multimedia uP Custom, Embedded uP Xilinx DSP TAM uP FASIC • Medium volume • Gate Arrays, • Embedded Processors • Prototyping • Multiple processors • High cost processors • Building • Blocks
DSP Processor Market Growth $Millions >33% CAGR High Performance DSP Processors $12,000 Xilinx DSP TAM Grows Faster Than Overall Market $10,000 $8,000 $6,000 $4,000 High Performance Portion $2,000 ‘97 ‘98 ‘99 ‘00 ‘01 ‘02 Source: Forward Concepts & Xilinx Estimate
Xilinx Portion of DSP Processor Market DSP Designs Starts > 1 MHz Data Sample Rates < 1 MHz New design starts 26% • More performance required for more new designs • Sample rates above 1 MHz = Xilinx DSP opportunity Survey source: Forward Concepts
FPGAs Now A Factor In DSP Fixed-Point DSP Processors Floating-Point DSP Processors ASICs Source: Forward Concept ‘98 Survey RISC FPGAs Xilinx DSP Chips Employed For DSP Functions (# Responses) “FPGAs represent a fast-growing market segment and are being increasingly employed for high-performance computation, often with the added benefit of reconfigurability”
DSP Processor Market Segments DSP Processors $3.8 • Communications 74% (Wireless, modems) • Computer 13% (Hard disk drives) • Consumer 2% • Industrial 3% • Instrumentation 2.3% • Military 3.5% • Office Automation 2% Xilinx DSP = High Performance, > 1 Processor, MHz sample rates
Communications: Largest FPGA Segment • Satellite modems • Cable modems • Copper - twisted pair • xDSL • Modem banks • Telecom test equipment • Wireless • Cellular / PCS • Base stations • Test equipment • Spread spectrum • Wireless local loop • Microwave internet • Smart antennas Communication Applications
Communications Applications • Common functions required: • Filters: interpolation, decimation, standard • sample rates from 2 MSPS to 80 MSPS • from 7 to 128 taps • NCO and Mixer • 32 bit phase accumulator, 1024 point table, 8 to 12 bit input data • 10 - 60 MSPS • Other • rectangular to polar conversion, power measurement, delay elements • 8 to 16 bit input, 1 to 20 MSPS • Multipliers • R-S en/decoders, turbo codes
Other High Performance Applications • Video and image processing • Medical - Ultrasound, MRI, CT • Industrial - security, manufacturing • Set top boxes • Digital TV broadcast equipment • HDTV • Military - radar, sonar, encryption, guidance, navigation, software radios • Instrumentation • Office automation
Altera DSP Cores 3rd Party IP: Altera IP: LPM Multiplier CFFT ‘97DS FS7 *Reed-Solomon HammerCore Adaptive equalizer HammerCore Square root ISS Floating-point Add, Divide, Mult ISS Rank order ISS Median filter ISS IIR ISS FIR ISS Decimating FIR ISS Integer divide ISS DCT, FFT ISS Image processing lib. ISS JPEG ISS ADPCM ISS Adaptive FIR ISS Reed Solomon ISS Viterbi ISS Trellis ISS Convolution encoder ISS Wavelet Filter ISS Adaptive eqilizer ISS Block $ convolutional Interlieve ISS Viterbi CAST Decimating filter FASTMAN Wavelet filter FASTMAN Convolutional interleaver (Cable & PCS) Ktech Telecom Telephone tone generator Ncomm NCO Nova Engineering Digital modulator Nova Engineering Linear feedback shift register Nova Engineering Binary correlator Nova Engineering Reference Designs: SDA and Parallel FIR Filter FS1 8, 16, 24, 32, and 64 taps Any data width Pipelining, symmetry 3x3 Video filter Floating-Point Add / Sub FS2 Floating-Point Mult FS4 Integer Divide FS3 Rounder FS5 Saturator FS6 RGB2YCrCb YCrCb2RGB ‘97DS New Cores Cordic
Altera Competitive Analysis • Cores • Good Multipliers and Parallel FIR • Inefficient SDA FIR filters • Most Altera reference designs are parameterizable • Easy to do with AHDL code • Not Smart-IP • More 3rd party DSP IP - but not effective • Little internal DSP IP • Customer 3rd party IP evaluation capability (security) • No announced DSP system level tools
SDA FIR Filter • Most commonly used core • Parallel In, Parallel Out • Bit serial processing • All taps processed in parallel • Full precession through entire core • One clock cycle required for each data bit • One additional clock cycle for symmetric filters
The Solution to High Performance DSP • Higher performance • 10X faster, parallel processing • Lower power • 50% to 80% less than DSP processors • Lower price • 1/5th the cost, Spartan FPGAs at ASIC prices • Faster time to market • Simpler design flow, no real-time software Add a Xilinx FPGA, not more processors