1 / 37

CS294-6 Reconfigurable Computing

CS294-6 Reconfigurable Computing. Day 5 September 8, 1998 Comparing Computing Devices. Quotes. An engineer is a man who can do for a dime what any fool can do for a dollar. If it can’t be expressed in figures, it is not science; it is opinion. -- Lazarus Long. Motivation.

libba
Download Presentation

CS294-6 Reconfigurable Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS294-6Reconfigurable Computing Day 5 September 8, 1998 Comparing Computing Devices

  2. Quotes • An engineer is a man who can do for a dime what any fool can do for a dollar. • If it can’t be expressed in figures, it is not science; it is opinion. -- Lazarus Long

  3. Motivation • Need to understand • How costly (big) is a solution • How compare to alternatives • Cost and benefit of flexbility

  4. What we really want: • Complete implementation of our application • For each architectural alternatives • In same implementation technology • w/ multiple area-time points

  5. Reality • Seldom get it packaged that nicely • much work to do so • technology keeps moving • Deal with • estimation from components • technology differences • few area-time points

  6. Today Empirical • Start sorting out • custom vs. configurable • spatial configurable vs. temporal

  7. FPGA Table

  8. How many gates?

  9. “gates” in 2-LUT

  10. Now how many?

  11. Gates/unit area? Usable gates?

  12. Gates Required? Depth=3, Depth=2048?

  13. Gate metric for FPGAs? • Day3: several components for computations • compute element • interconnect: • space • time • instructions • Not all applications need in same balance • Assigning a single “capacity” number to device is an oversimplification

  14. Exercise Admin • Simulation slow -> see fastsim alternative • Exercise 1 more effort than anticipated • Drop exercise 4 and rearrange due date • SPACE1 --- ASAP • SPACE2 --- original 9/10, try before 9/15 • CYCLE --- 9/24

  15. Density vs. Binding Time Full Custom Gate Array Density FPGA Final Mask(s) “startup” Processor Pre-mask cycle Binding Time

  16. AMI CICC’83 MPGA 1.0 Std-Cell 0.7 Custom 0.5 Toshiba DSP Custom 0.3 Mosaid RAM Custom 0.2 GE CICC’86 MPGA 1.0 Std-Cell 0.4--0.7 FF/counter 0.7 FullAdder 0.4 RAM 0.2 MPGA vs. Custom?

  17. Metal Programmable Gate Arrays

  18. MPGAs • Modern -- “Sea of Gates” • yield 35--70% • maybe 5kl2/gate ? (quite a bit of variance)

  19. MPGA (SOG GA) 5Kl2/gate 35-70% usable (50%) 7-17Kl2/gate net Ratio: 2--10 (5) Xilinx XC4K 1.25Ml2 /CLB 17--48 gates (26?) 26-73Kl2/gate net MPGA vs. FPGA Adding ~2x Custom/MPGA, Custom/FPGA ~10x

  20. MPGA (SOG GA) l=0.6m tgd~1ns Ratio: 1--7 (2.5) Xilinx XC4K l=0.6m 1-7 gates in 7ns 2-3 gates typical MPGA vs. FPGA

  21. Processors and FPGAs

  22. Processors and FPGAs

  23. Degrade from Peak: Processors • Ops w/ no gate evaluations (interconnect) • Ops use limited word width • Stalls waiting for retimed data

  24. Degrade from Peak: FPGAs • Long path length --> not run at cycle • Limited throughput requirement • bottlenecks elsewhere limit throughput req. • Insufficient interconnect • Insufficient retiming resources (bandwidth)

  25. Degrade from Peak: Custom/MPGA • Solve more general problem than required • (more gates than really need) • Long path length • Limited throughput requirement • Not needed or applicable to a problem

  26. Raw Density Summary • Area • MPGA 2-3x Custom • FPGA 5x MPGA • Area-Time • Gate Array 6-10x Custom • FPGA 15-20x Gate Array • Processor 10x FPGA

  27. Raw Density Caveats • Processor/FPGA may solve more specialized problem • Problems have different resource balance requirements • …can lead to low yield of raw density

  28. Broadenning Picture • Compare larger computations • For comparison • throughput density metric: results/area-time • normalize out area-time point selection • high throughput density • -> most in fixed area • -> least area to satisfy fixed throughput target

  29. Multiply

  30. FIR

  31. IIR/Biquad

  32. DES Keysearch <http://www.cs.berkeley.edu/~iang/isaac/hardware/>

  33. DNA Sequence Match • Problem: “cost” of transform S1->S2 • Given: cost of insertion, deletion, substitution • Relevance: similarity of DNA sequences • evolutionary similarity • structure predict function • Typically: new sequence compared to large databse

  34. DNA Sequence Match

  35. Floating-Point Add (single prec.)

  36. Floating-Point Mpy (single prec.)

  37. Summary • Raw densities (Area-Time) • FPGA/custom = 100x • Processor/custom = 1000x • Special-purpose functional units in processors/DSPs, much lower net benefit since need to control and interconnect • Gap narrows (closes) as programmable can be specialized

More Related