1 / 14

Mapping into LUT Structures

This paper introduces efficient algorithms and modified hardware solutions to optimize delay in FPGA mapping using new LUT structures. The proposed matching algorithm and LUT structures show promising results in improving traditional mapping approaches. Experimental setups and results demonstrate significant delay reductions.

janak
Download Presentation

Mapping into LUT Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mapping intoLUT Structures Alan Mishchenko Stephen Jang Chao Chen UC Berkeley Agate Logic Inc

  2. Overview • Introduction • Contributions • Algorithms • Experimental results • Conclusion

  3. Introduction • Delay optimizations is a high priority • Can be addressed by • CAD algorithms • FPGA hardware • We propose a two-fold solution • Improved LUT mapping algorithm • Modification to the hardware

  4. Two LUT Structures 10‑input LUT structure “444” 7-input LUT structure “44”

  5. Contributions • Developed of an efficient matching algorithm to check whether a given Boolean function can be implemented using a given LUT structure • Modified the priority-cut-based technology mapper in ABC to perform mapping into the LUT structures • Evaluated new algorithm and new LUT structures • Collected statistics on implementable K-input Boolean functions appearing in industrial designs

  6. “44” Matching Algorithm The input is a N-input Boolean function (4 ≤ N ≤ 7). The output is the configuration of two 4-LUTs. Implementation: • If N = 4, the function can be trivially implemented using one 4-LUT. • If N = 5, the function can be implemented using two 4-LUTs. Relatively few 5-input functions cannot be implemented using the “44” structure. (See Theorem 1.) • If N = 6, a naïve decomposition check tries each group of four variables. • If N = 7, the only case when the function can be implemented using a given structure, is when it is DSD-decomposable and its DSD structure can be mapped into the given LUT structure.

  7. Experimental Setup Mapping into dedicated hardware 1.20 1.20 1.20 1.20 1.20 Improved traditional mapping Baseline (if) and mapping with structural choices (MSC) (dch; if -j)4 # k area delay 1-4 1.00 1.00 44: Runs (dch; if -j)4 with LUT library: # k area delay 1-4 1.00 1.00 5-7 2.00 2.00 444: Runs (dch; if -j)4 with LUT library: # k area delay 1-4 1.00 1.00 5-10 3.00 2.00 Best 444: Runs (dch; if -j)4 with LUT library: # k area delay 1-4 1.00 1.00 5-6 2.00 2.00 7 2.50 2.00 8-10 3.00 2.00

  8. Experimental Results Summary Table 4.1. Improvements to the traditional FPGA mapping. Table 4.2. Delay-optimization using dedicated FPGA architecture with direct connections between adjacent LUTs.

  9. Table 4.1

  10. Table 4.2

  11. Ratios of Implementable Functions • “44” structure • 4-input – 100% • 5-input – 99.99% • 6-input – 99% • 7-input – 84% • “444” structure • 4-input – 100% • 5-input – 100% • 6-input – 99.99% • 7-input – 97.6% • 8-input – 94.5% • 9-input – 75.5% • 10-input – 39.7%

  12. Conclusions • Motivated delay improvement • Introduced several LUT structures • Proposed fast truth-table-based Boolean matching • Evaluated improvements and got promising results • In traditional mapping, -10% in delay and -6% in area • With dedicated hardware, -41% in delay and +24% in area • Future work: Measure improvements after P&R

  13. Abstract • Mapping into K-input lookup tables (K-LUTs) is an important step in synthesis for Field-Programmable Gate Arrays (FPGAs). The traditional FPGA architecture assumes routable interconnect between individual LUTs. We propose a modified FPGA architecture which allows for direct (non-routable) connections between adjacent LUTs. The delay between such LUTs can be shorter. The improvement in delay may come with the restriction on the fanout of LUTs connected using direct connections. As a result, delay can be reduced while area can be increased, compared to the traditional mapping. This paper investigates two types of LUT structures and the associated tradeoffs. Experimental results indicate that when the LUT structures are used, the results of traditional mapping can be improved roughly 10% in delay and 6% in area. When the dedicated hardware is used, the delay improvement can be up to 40% at the cost of some area increase.

More Related