1 / 25

LMS: A New Logic Synthesis Method Based on Pre-Computed Library

LMS: A New Logic Synthesis Method Based on Pre-Computed Library. Wenlong Yang Lingli Wang State Key Lab of ASIC and System Fudan University, Shanghai, China. Alan Mishchenko Department of EECS University of California, Berkeley. Outline. Introduction Previous Work

dean-king
Download Presentation

LMS: A New Logic Synthesis Method Based on Pre-Computed Library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LMS: A New Logic Synthesis Method Based on Pre-Computed Library Wenlong Yang Lingli Wang State Key Lab of ASIC and System Fudan University, Shanghai, China Alan Mishchenko Department of EECS University of California, Berkeley

  2. Outline • Introduction • Previous Work • Lazy Man’s Logic Synthesis(LMS) • Experimental Results • Conclusion & Future Work

  3. Introduction • Goal of logic synthesis: Deriving a circuit or improving an available circuit • We proposed a “Lazy” approach to reuse optimal structures derived by other synthesis tools based on a pre-computed library Other tools A Function with N variables AIG LMS precomputed library

  4. Outline • Introduction • Previous Work • Lazy Man’s Logic Synthesis(LMS) • Experimental Results • Conclusion

  5. Previous Work • Logic synthesis based on precomputed library have been proposed in several papers, but they are all different from LMS: • LMS • Precompute structures in terms of AIGs • Use public benchmarks and existing tools • Look at 6-16 input functions • Store many equivalent structures • Previous work • Precompute structures in terms of LUTs • [Kennings, IWLS, 2010 ] • Didn't use preexisting benchmarks or tools [Bjesse, ICCAD , 2004] • Look at only 4-5 input functions • [Li, IWLS, 2011] • Only compute multiple structure choices • [Chatterjee, TCAD, 2006]

  6. Previous Work – SOP Balancing • For each node • Compute several k-input cuts • Perform delay-optimal tree balancing of the SOP • Select the best one to replace the current structure. F’ = !c*!(b*!a) F = !c*!b + !c*a An AIG subgraph found in benchmark s27.blif where SOP balancing loses to the proposed approach

  7. Outline • Introduction • Previous Work • Lazy Man’s Logic Synthesis(LMS) • Equivalence Classes • Library Representation/Construction • Implementation • Experimental Results • Conclusion

  8. Equivalence Classes • LMS is based on collecting, storing, and re-using circuit structures of Boolean functions with 6-16 input variables. • The total number of completely-specified Boolean functions of N variables is 2^(2^N). • Experiments shows that even for the practical functions, this number can be very large. To reduce the number and memory need to store functions in a library, a canonical form is used to break them into Equivalence Classes.

  9. NPN • Two functions are NPN-equivalent if one of them can be obtained from the other by negation and/or permutation of the inputs and outputs. • Drawbacks of NPN computation: • Time-consuming • Complicated Complete NPN canonical form is not affordable to LMS

  10. Semi-Canonical Form • The idea is to order the input variables and the polarities of inputs/outputs using the number of positive minterms and cofactors w.r.t. each variable. Input:TruthTable F • Determine the polarity of F by the number of 1’s in TruthTable • Determine the polarity of each variable by the number of 1s in the negative cofactor w.r.t. each variable • Sort input variables by the number of 1s in their negative cofactors and permute inputs accordingly Output:canonicizedTruthTable F A reasonable trade-off between accuracy and speed

  11. Library Representation • An N-input library contains functions up to N variables. • Structures of all functions are represented as a shared AIG • Each output of the AIG is the root node of one logic structure. • When a library is loaded, the following actions are performed: • A hash table is created to hash the outputs by its semi-canonical form. • For each structure, the area and pin-to-output delays are computed and stored.

  12. Pin-To-Output Delay & Dominated Structure {3, 2, 4, 5, 2, 3, 1} Suppose arrival time: + {3, 3, 3, 5, 5, 4, 1} Pin-to-output delay: = {6, 5, 7, 10, 7, 7, 2} Example of using pin-to-output delays to compute structure delay If one structure’s pin-to-output delay is worse than another with respect to every input, the structure is dominated.

  13. Library Construction • LUT mapper if in ABC is used as a structural cut browser to generate K-input cuts whose logic structures are added to the library. Input: Cut C • If cut C does not meet the requirements return • Compute Boolean function F of cut C as a truthtable • Compute the semi-canonical form of F • Rebuild the structure of the cut in the library • If ( the structure already exists or is dominated ) return • Add a new primary output to store the structure in the hash table

  14. A case study of LMS: AIG level minimization Input:And-Inverter Graph • For each node, in a topological order • Compute several K-input cuts • For each cut • Compute truth table • Look up in the library • If there is no structure for this function • Mark the cut to ensure it is not selected as best cut • Else if the best structure found leads to smaller AIG level • Save the cut as the best cut • If there is an improvement in level, update AIG

  15. Implementation • The LMS algorithm is implemented in ABC. The LUT mapper ifin ABC is used as: • (a) Acut browser for computing the libraries • (b) Amapper in the case study on AIG level minimization • Commands related to library construction: • rec_start: Starts the LMS recorder. • rec_add: Add structures from benchmarks • rec_filter: Removes the structures with less frequency • rec_merge: Merges two previously computed libraries • rec_ps: Prints statistics for the currently loaded library • rec_use: Transforms the internal library to the current network in ABC • rec_stop: Deletes the current library. • Commands used to perform LMS mapping: • if –y –K <num> -C<num> • -y enables level optimization by LMS • -K <num> is the cut size • -C <num> is the number of cuts used at each node

  16. Outline • Introduction • Previous Work • Lazy Man’s Logic Synthesis(LMS) • Experimental Results • Library Coverage • 6-input Library • Optimize Delay After LUT Mapping • Conclusion

  17. Library Coverage • This experiment was performed to show that LMS has practical memory requirements for functions up to 12 inputs. • Semi-canonical classes of all functions appearing in the cuts of the benchmark circuits without synthesis, were collected and the frequency of their appearance was recorded. • ~2 M classes in total • ~740 K classes for 90% functions • ~400MB for truth tables Function # occurrence frequency

  18. Constructing Library for 6-input Functions Statistics of the precomputed 6-input library • The goal of this experiment is to derive a 6-input library used in the following case study of AIG level minimization. • The following ABC scripts are used to collect structures: • read file; st; rec_add; • dc2; rec_add; • if -K 8; bidec; st; rec_add; • if -K 8; mfs; st; rec_add; • if -K 8; bidec; st; rec_add; • if -g -K 6; st; rec_add; • if -g -K 6; st; rec_add; • ~77MB AIGER file

  19. Optimize Delay After LUT Mapping • Two sets of benchmarks are used in this paper: 20MCNC benchmarks and 10 large Altera benchmarks. • LUT mapping was performed by the following scripts: • Map: st; resyn2; if -K 4 or 6 • MapC: st; resyn2; dch -f; if -K 4 or 6 • SOPBC: st; if -gm -K 6; st; resyn2; dch -f; if -K 4 or 6 • LMSC: st; if -ym -K 6; st; resyn2; dch -f; if -K 4 or 6 • Benchmarks were run on a workstation with a Intel Xeon Quad Core CPU and 256GBytesRAM(~4GB used for the experiment) • The resulting networks were verified by command cec in ABC.

  20. Mapping results for Altera benchmarks(4-LUTS) LMSC reduced delay by 37% with an area increase of 13%

  21. Mapping results for Altera benchmarks(6-LUTS) LMSC reduced delay by 26% with an area increase of 13%

  22. Mapping results for MCNC benchmarks 4-LUTs: LMSC reduced delay by 10% with an area increase of 3% 6-LUTs: LMSC reduced delay by 12% with an area increase of 8%

  23. Conclusion • A new method to harvest and re-use circuit structures produced by different tools on benchmark circuits • The “lazy” approach is made practical by • A semi-canonical form to reduce the number of equivalence classes • Using AIGs to store precomputed libraries in memory and on disk • Using truth tables to manipulate Boolean functions • As the case-study, the proposed approach was applied to improve delay after FPGA mapping • For industrial benchmarks, compared to SOP balancing, • the delay was reduced by 17% (18%) for LUT4 (LUT6) • the area penalty was 2% (5%)

  24. Future work • Improving implementation • Reducing memory by using a low-memory AIG • Building libraries in terms of multi-input gates • Filtering libraries based on their performance • Giving the user control over the area increase • Continuing experiments • Performing case studies with larger functions • Evaluating delay improvements after P&R

  25. Q&A Authors' E-mail: • Wenlong Yang allanwin@hotmail.com • LingliWang llwang@fudan.edu.cn • Alan Mishchenko alanmi@eecs.berkeley.edu

More Related