170 likes | 190 Views
Mapping into LUT Structures. Sayak Ray , Alan Mishchenko, Niklas Een, Robert Brayton Department of EECS, UC Berkeley Stephen Jang, Chao Chen Agate Logic Inc. Contributions (in a nutshell). New mapping algorithm for FPGAs, which maps into LUT structures , instead of LUTs
E N D
Mapping into LUT Structures Sayak Ray, Alan Mishchenko, Niklas Een, Robert Brayton Department of EECS, UC Berkeley Stephen Jang, Chao Chen Agate Logic Inc.
Contributions (in a nutshell) New mapping algorithm for FPGAs, which maps into LUT structures, instead of LUTs It has two applications: (1) Improving the quality of mapping into LUTs Area improves by 7.4% on average Delay improves by 11.3% on average (2) Improving delay for specialized hardware, which supports non-routable connections Delay improves by 40% on average With some area penalty
LUT Structure LUT-structure – a group of LUTs connected by direct, non-routable wires Non-routable Wire Non-routable Wire Non-routable Wire 7-input LUT structure “44” 10‑input LUT structure “444”
Some Terminology Let (X) be a Boolean function Let X1 X be a subset of its support Suppose {q1(X), q2(X), …, q(X)} is the set of distinct cofactors of w.r.t. X1 is called the column multiplicity of w.r.t X1 Given a partition of X into two disjoint subsets X1and X2, we say that Ashenhurst-Curtis decomposition of(X) exists if(X) can be expressed as (X) = h(g1(X1), g2(X1), …, gk(X1), X2) X1 : bound set X2 : free set
Flow of performLutMatchingXY 1 SupportMinimize removes vacuous variables 2 findOutputDecomposition Checks for f = x G • Variable reordering in truth table • Allows cases = 2, 3, 4 • For = 3, 4, consider special decomposition with one shared variable only 3 findGoodBoundSet 4 checkSpecialNonDisjoint 5 reverseVariableOrder A heuristic to find suitable decomposition 6 findGoodBoundSet 7 checkSpecialNonDisjoint
Checking for XYZ decomposition X, Y, and Z are sizes of the main/fanin LUTs Two step process Checking for XW where W = Y + Z – 2 If it exists, then check the remainder function G for YZ Priority cut-based technology mapper is modified to accommodate the algorithm for XY and XYZ The results of decomposition checking are cached This substantially reduces runtime on large designs
Future Work • Improving Implementation • Handling delay driven decomposition • Currently we ignore arrival time, and just care about detecting any decomposition • Using semi-canonical form to increase the number of hits in the hash table of computed results • Making truth-table based decomposition even faster • Combining Boolean decomposition into LUT structures with structural mapping of LUTs into clusters • Evaluating results after place and route • This will be especially interesting when specialized hardware is available
Questions • Questions….