1 / 38

1 Q. Wang is now with Huawei Technologies (America).

Area-Efficient FPGA Logic Elements: Architecture and Synthesis Jason Anderson and Qiang Wang 1 IEEE/ACM ASP-DAC Yokohama, Japan January 26-28, 2011. 1 Q. Wang is now with Huawei Technologies (America). Want More for Same Money!. Logic Density Objective.

tyrone
Download Presentation

1 Q. Wang is now with Huawei Technologies (America).

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Area-Efficient FPGA Logic Elements:Architecture and SynthesisJason Anderson and Qiang Wang1IEEE/ACM ASP-DACYokohama, JapanJanuary 26-28, 2011 1Q. Wang is now with Huawei Technologies (America).

  2. Want More for Same Money!

  3. Logic Density Objective Goal: Implement more gates per unit area of silicon.

  4. FPGA Logic Blocks • Use LUTs to implement logic functions. i1 i2 i3 i1 i2 S i1 S S i2 S 4-LUT S i3 S S i4 S S Can realize any4-input function S SRAM cell S S 2-LUT LUT size doubles with each input added. 3-LUT

  5. Today’s FPGAs • Modern FPGAs use 6-LUTs: • Fewer levels of logic vs. 4-LUTs leads to higher speed. • But, many logic functions use < 6 inputs  is hard to efficiently utilize 6-LUTs. • 6-LUTs in modern chips “fracturable” to improve logic density.

  6. 6-LUT Implementation • 64 SRAM cells. • 63 2-to-1 multiplexers in the tree. i6 i1 5-LUT i2 i3 i4 i5 5-LUT

  7. 6-LUTs in Commercial Chips Xilinx Virtex-6 LUT Altera Stratix-IV ALM Dual-output 6-LUT ALM One 6-input function,two 4-input functions,one 5-input function+one 3-input function,others. One 6-input function or any two functions that useup to 5 inputs.

  8. Shannon Decomposition • Consider Boolean function of n variables: g = f(x1, x2, …, xi, …, xn)Can decompose w.r.t. one of its variables xi: g = xif(x1, x2, …, 1, …, xn) + xif(x1, x2, …, 0, …, xn)

  9. Shannon Decomposition • Consider Boolean function of n variables: g = f(x1, x2, …, xi, …, xn)Can decompose w.r.t. one of its variables xi: g = xif(x1, x2, …, 1, …, xn) + xif(x1, x2, …, 0, …, xn) 1-cofactor (gxi) 0-cofactor (gxi)

  10. Co-Factor Properties • Use | gxi | to represent the # of variables in the co-factor: • Size of the co-factor.

  11. Trimming Input Concept • A k-variable logic function,g = f(x1, …, xk) has a trimming input xiif either | gxi | < k -1, or if | gxi | < k -1. • Example (4 variables): g = ac + a’b’d + d’a1-cofactor: gd = ac + a’b’ (3 variables)0-cofactor: gd’ = ac + a = a (1 variable) • d is a trimming input.

  12. How Common Are Trimming Inputs? Extremely Common! Logic Circuit

  13. i6 Proposed 5+4-LUT Architecture i1 i2 i3 i4 i5 s 5-LUT • 6-variable functions with a trimming input can fit into: 4-LUT Contains: 32 + 16 + 1 = 49 SRAM cells (vs. 64 in 6-LUT)31 + 15 + 2 = 48 2-to-1 multiplexers (vs. 63 in 6-LUT)

  14. Gating Input Concept • A k-variable logic function,g = f(x1, …, xk) has a gating input xiif either | gxi | = 0, or if | gxi | = 0. • Interpretation: The (gating) state of xi causes g to be logic-0 or logic-1. • Example (4-variables):g = (a+b+c)d1-cofactor: gd = a+b+c (3 variables)0-cofactor: gd’ = logic-0 (0 variables)

  15. How Common are Gating Inputs? Still Significant! Logic Circuit

  16. Proposed Extended 5-LUT Architecture • 6-variable functions with a gating input can fit into: Gating input i6 i1 i2 i3 i4 i5 s 5-LUT s Holds the gating state Contains: 32 +2 = 34 SRAM cells (vs. 64)31 + 2 = 33 2-to-1 multiplexers (vs. 63)

  17. Additional Architectures i5 i5 i5 i6 i6 i1 i2 i3 i4 i1 i2 i3 i4 s s i1 i2 i3 i4 s s i6 s 4-LUT 4-LUT M1 M1 s s 4-LUT M1 M3 M3 M2 3-LUT 3-LUT 4-LUT M2 M2 3-LUT 3-LUT s s s Two extended 4-LUTs Extended 4-LUT + extended 3-LUT Extended 4+3-LUT 35 SRAM cells 28 SRAM cells 27 SRAM cells

  18. ABC Logic Synthesis • Mainly developed at UC Berkeley [Primary developer: Mishchenko starting 2005/2006]. • Uses an AND-INVERTER graph (AIG): • Quality LUT mapping solutions. z z complemented edge a b c d a b c d

  19. Finding Shannon Co-Factors is Easy in ABC z z logic-0 0-cofactor w.r.t. b x x c c f f a b d e d e a b logic-0 Observation: setting an AIG input to logic-0 or logic-1 simply “eliminates”a chunk of the AIG.

  20. Finding Shannon Co-Factors is Easy in ABC z z 0-cofactor w.r.t. b logic-0 x c f f x c a b d e d e z = [(de)’f]’ a b logic-0 Observation: setting an AIG input to logic-0 or logic-1 simply “eliminates”a chunk of the AIG.

  21. FPGA Technology Mapping Based on notion of “cuts”.Example: mapping to 3-LUTs z z “3-feasible” cutsfor node z x z x d a c b e z c

  22. Tech Mapping to K-LUTs • Walk circuit DAG from PIs to POs: • Compute K-feasible cuts for each node. • Select a “best” cut for each node: • Best: area, depth, power, etc. • Walk circuit DAG from POs to PIs: • Construct mapped network by instantiating the selected best cuts.

  23. i6 Technology Mapping i1 i2 i3 i4 i5 s 5-LUT • Discard cuts that do not meet requirements of target architecture. • Easy check with AIGs/Shannon decomp. • Example: 4-LUT • Consider a cut C for function g: • If |Inputs(C)| < 6, C qualifies. • If |Inputs(C)| = 6, C qualifies iff i  Inputs(C) such that |gi| < 5 or |gi| < 5. 5+4-LUT architecture

  24. Technology Mapping • Example 2: • Which functions can fit? • Any function that uses  4 inputs. • Any 5-input function. • Any 6-input function where: a) both co-factors w.r.t. an input require no more than 4 distinct variables ORb) both co-factors w.r.t. an input have a (different) gating input. i5 i1 i2 i3 i4 s i6 4-LUT M1 s M3 4-LUT M2 s Two extended 4-LUTs

  25. Methodology • Two metrics: • # of logic elements (area). • Mapped circuit depth [# of logic elements on the critical path] (speed).

  26. Methodology • 20 MCNC circuits; 13 VPR 5.0 circuits. • VPR 5.0 circuits synthesized to BLIF using Quartus 9.1 / QUIP. • Baseline mapper: priority cuts [ICCAD’07] (if command in depth mode). • resyn2 tech. independent synthesis. • Tech. indep. synth. + mapping run 6 times for each cct & best result taken.

  27. Results VPR 5.0 MCNC

  28. i6 Results i1 i2 i3 i4 i5 s 5-LUT 4-LUT VPR 5.0 5+4-LUT achieves nearly identical #LEs/depth to 6-LUT

  29. Results i6 i1 i2 i3 i4 i5 s 5-LUT s

  30. Results i5 i1 i2 i3 i4 s i6 4-LUT M1 s M3 4-LUT M2 s Better than extended 5-LUT for almost same silicon area.

  31. Results i5 i6 i1 i2 i3 i4 s s 4-LUT M1 s M3 3-LUT 3-LUT M2 s

  32. Results i5 i6 i1 i2 i3 i4 s s 4-LUT M1 M2 3-LUT 3-LUT s

  33. Overall Logic Density Estimate • Assume baseline 6-LUT FPGA architecture with core tile silicon area breakdown: • 50% routing; 25% LUT; 25% other (FF, carry, etc). • Assume LUT portion reduced in proportion to # SRAM cells in LE. Si area relative tobaseline arch = f X (75% + 25% X s) Ratio of # SRAM cells (<=1) Ratio of # of LEs needed (>=1)

  34. Area & Area-Depth Product • Average across 33 circuits: 14% 5% Depth

  35. Summary • Logic functions very often have trimming inputs. • Inspire new logic element architectures. • New elements can achieve estimated 14% improvement in logic density vs. 6-LUTs; 5% improvement in area-depth product. • May also offer power benefits. • Results for 7-input elements in paper. • Future work: don’t cares; fracturable LUTs.

  36. Back-Up Slides

More Related