210 likes | 354 Views
Literature Review. Measuring the Gap Between FPGAs and ASICs Ian Kuon, Jonathan Rose University of Toronto IEEE TCAD/ICAS Feburary 2007. Henry Chen February 26, 2010. Introduction. Trade-offs between FPGAs and standard-cell ASICs Decreased NRE, design time
E N D
Literature Review Measuring the GapBetween FPGAs and ASICs Ian Kuon, Jonathan Rose University of Toronto IEEE TCAD/ICAS Feburary 2007 Henry Chen February 26, 2010
Introduction • Trade-offs between FPGAs and standard-cell ASICs • Decreased NRE, design time • Increased silicon area, power; decreased performance • FPGA inefficiencies known and accepted,but largely un-quantified
Previous Comparisons • Jones et al. (1986): MPGAs to standard cells • 1.52.6x area, ~1.1x delay • Estimates based on only 5 circuits • Brown et al. (1992): FPGAs to MPGAs • 812x area, ~3x delay • Optimistic FPGA gate counting? • Anecdotal evidence • Doesn’t consider “hard” macros (multipliers, memories) • Combine for FPGAs to standard cells • 1238x area, ~3.4x delay • Dated; based on (questionable?) extractions
Previous Comparisons (2000’s) • Zuchowski et al. (2002): LUT to ASIC gate (0.25μm90nm) • ~1/45 gate density, 1214x delay, ~500x dynamic power • Unexplained process-dependent density/power variation • Dependent on gates implemented per LUT • Wilton et al. (2005): Partial programmable replacement • 88x area, 2x delay • Single logic module • Compton & Hauck (2007): FPGA apps. to standard-cell • Avg 7.2x area • Scaled FPGA 0.15μm to 0.18μm standard-cell
Methodology • Implement in both FPGA and standard-cell • Altera Stratix II FPGA: TSMC 90nm multi-Vt, 1.2V • Standard-cell: ST CMOS090 90nm, dual-Vt, 1.2V • Empirical results from 23 benchmarks • Rejected if different synthesis tools resulted in>5% register count deviation • Mix of logic, memory, DSP • Analyze gains from FPGA’s DSP and memory blocks • Exclude I/Os • Have device data from Altera
Implementations • FPGA • Altera-provided CAD flow • Speed/area balanced optimization; optimize critical paths performance, otherwise optimize area • Automatic DSP, memory block inference • Set to mimic effects of high resource utilization • ASIC • Synopsys/Cadence synthesis/PAR flow • Free to choose from high/standard-Vt cells • Timing-driven placement; target 7585% utilization • Emphasized performance in compiled memories
Area Comparison • ASIC • Post PAR’d core area • Include memory macros • FPGA • Count only silicon area for used resources • Include surrounding routing resources • Count full block area even if only partially used • Area data from Altera
Area Comparison Results • Logic only:35x avg (17‒54x) • Logic + DSP:25x avg (12‒58x) • Logic + Memory:33x avg (19‒70x) • Logic + Memory + DSP:18x avg (9.5‒26x)
Impact of Hard Macros on Area • Smaller area penalty for designs using hard macros • Hard macro close to ASIC implementation(plus programmable interface & routing)
Area Comparison Caveats • Pessimistic FPGA area estimation; count full resource area even if only partially used (~5‒10% reduction) • ASIC density may decrease for larger designs, while FPGAs are designed to handle large designs
Delay Comparison • Altera Quartus II / Synopsys PrimeTime SI • Static timing analysis to extract max. clock frequency • Compare for different FPGA speed grades • FPGAs are binned for performance • ASICs tend to be designed for worst-case
Delay Comparison Results(Fastest Speed Grade) • Logic only:3.4x avg (1.9‒5.0x) • Logic + DSP:3.5x avg (2.4‒4.7x) • Logic + Memory:3.5x avg (2.8‒4.3x) • Logic + Memory + DSP:3.0x avg (2.6‒3.5x)
Delay Comparison Results(Slowest Speed Grade) • Logic only:4.6x avg (2.5‒6.7x) • Logic + DSP:4.6x avg (3.0‒6.3x) • Logic + Memory:4.8x avg (3.8‒5.7x) • Logic + Memory + DSP:4.1x avg (3.8‒4.7x)
Impact of Hard Macros on Delay • Almost no benefit—sometimes penalty! • Fixed positions in FPGA; extra routing to use • Fixed architecture; some apps. may not use efficiently
Power Comparison • Altera Quartus II Power Analyzer / Synopsys PrimePower • Compare power, not energy consumption • FPGAs slower; need more time or parallelism • Implement for highest speed possible • Simulate at same operating frequency, voltage • Measure only core power • Assume constant toggle rates for all nets in design • Meaningful test vectors not available for all designs • FPGA static power consumption scaled by used fraction
Power Comparison Results • Logic only:14x avg (5.7‒52x) • Logic + DSP:12x avg (7.5‒16x) • Logic + Memory:14x avg (12‒16x) • Logic + Memory + DSP:7.1x avg (5.3‒8.3x)
Impact of Hard Macros on Power • Slight benefit—primarily from area savings? • Less area and interconnect
Power Consumption Caveats • May be disproportionate power in FPGA clock network • “Overdesigned” for tested circuits • Could have small incremental power increase • ASIC clock network would have to grow with designs
Static Power Comparison • Unable to draw useful conclusions about static power • 87x for typical silicon, typical temp. (25°C) • 5.4x for worst-case silicon, worst-case temp. (85°C) • Had to scale worst-case silicon temp. characterization • Subthreshold leakage is process-dependent • Little information on leakage estimate factors • Different processes from different foundries • Some correlation between static power and area gap(correlation coefficient ~0.8) • Hard macros likely reduced static power penalty
Conclusions • Disparity hard to quantify—very application dependent • Avg. gap gap 3x; gap gap range 1.3‒9.1x • All-LUT designs avg. 35x area, 3.4‒4.6x delay, 14x power • 119x area, 47.6x power gap for equal performance(assuming ideal parallelization) • Hard macros reduce area and power, but have little performance benefit • Avg. 18x area, 3‒4.1x delay, 7.1x power • 54x area, 21.3x power for equal performance
References • Jones, Jr., H. S., Nagle, P. R., Nguyen, H. T., “A Comparison of Standard Cell and Gate Array Implementations in a Common CAD System”, Proc. IEEE CICC, 1986, pp. 228232 • Brown, S. D., Francis, R., Rose, J., Vranesic, Z., Field-Programmable Gate Arrays, Norwell, MA: Kluwer, 1992 • Zuchowski, P. S., Reynolds, C. B., Grupp, R. J., Davis, S. G., Cremen, B., Troxel, B., “A Hybrid ASIC and FPGA Architecture,” Proc. ICCAD, Nov. 2002, pp. 187194 • Wilton, S. J., Kafafi, N., Wu, J. C. H., Bozman, K. A., Aken’Ova, V., Saleh, R., “Design Considerations for Soft Embedded Programmable Logic Cores”, IEEE JSSC, vol 40, no. 2, pp. 485497, Feb. 2005 • Compton, K., Hauck, S., “Automatic Design of Area-Efficient Configurable ASIC Cores,” IEEE Trans. Comp., vol 56, no. 5, pp. 662672, May 2007