320 likes | 338 Views
FPGA Area Reduction by Multi-Output Sequential Resynthesis. Yu Hu 1 , Victor Shih 2 , Rupak Majumdar 2 and Lei He 1 1 Electrical Engineering Dept., UCLA 2 Computer Science Dept., UCLA Presented by Yu Hu. Address comments to lhe@ee.ucla.edu. Outline. Background and Motivation
E N D
FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu1, Victor Shih2, Rupak Majumdar2 and Lei He1 1Electrical Engineering Dept., UCLA 2Computer Science Dept., UCLA Presented by Yu Hu Address comments to lhe@ee.ucla.edu
Outline • Background and Motivation • Combinational Resynthesis with MIMO Blocks • Sequential Resynthesis • Experimental Results • Conclusion and Future Work
Background • Area-optimal Technology Mapping for LUT-based FPGAs is NP-Hard [Farrahi, TCAD’94] • Post-mapping resynthesis is effective to reduce area (LUT#) [Ling, DAC’05] Area reduction Fault tolerance, power optimization, physical-aware optimization, and many others.
Boolean Matching Based Resynthesis • Attempt to re-map a logic block to reduce LUT# • BM can be used to handle both homogenous and heterogeneous PLBs (Source: Andrew Ling, University of Toronto, DAC'05)
Overall Flow of BM-based Resynthesis • Multi-iterations of block-based Boolean Matching (Source: Andrew Ling, University of Toronto, DAC'05)
Limitations of Existing Work • Considering single-output logic blocks • Considering combinational portion of the circuit • A larger solution space can be explored and area could be reduced if • Multiple-output logic blocks are considered • FF boundaries are eliminated
Resynthesis is restricted by FF boundaries … Retiming creates chances for resynthesis Motivation Example – Retiming 2-LUT network
Function of O2 has to be preserved … Only 1-LUT reduction Motivation Example – MISO Resynthesis 2-LUT network
60% area reduction is obtained by sequential MIMO resynthesis! Motivation Example – MIMO Resynthesis 2-LUT network
Major Contributions • Present a Boolean matching based resynthesis algorithm considering multi-output logic blocks • Propose a sequential resynthesis technique • Reduce area by up to 10% compared to combinational resynthesis, when both using MIMO blocks
Outline • Background and Motivation • Combinational Resynthesis with MIMO Blocks • SAT-based Boolean Matching for Multiple Output Functions • Resynthesis Algorithm • Experimental Results • Sequential Resynthesis • Experimental Results • Conclusion and Future Work
Existing Boolean Matching for MISO 2-LUT f g 2-LUT 2-LUT 2-LUT ? 2-LUT • Formulate the sub-problem of resynthesis to Boolean matching (BM) • BM: Can function fbe implemented in circuit g ? • Resynthesis: Is there a configuration to gso that for all inputs to g, f is equivalent to g? (Source: Andrew Ling, University of Toronto, DAC'05)
SAT-BM for Multi-Output Functions G LUT [i1, i2 ,F] = ( i1 + i2+ ¬L0 + F) ( i1 + i2+ L0 + ¬ F) ( i1 + ¬ i2+ ¬L1 + F) ( i1 + ¬ i2+ L1 + ¬ F) (¬ i1 + i2+ ¬L2 + F) (¬ i1 + i2+ L2 + ¬ F) (¬ i1 + ¬ i2+ ¬L3 + F) (¬ i1 + ¬ i2+ L3 + ¬ F) G = G LUT1 [x1,x2 , F2] ·G LUT2 [F2 ,x3 , F1] Characteristic function Configuration bits are encoded as SAT literals
SAT-BM for Multi-Output Functions G = G LUT1 [x1,x2 , F2] ·G LUT2 [F2 ,x3 , F1] The solution of this SAT problem corresponds to the Boolean matching results SAT! Replicated SAT Problem: G expand = G[X/000, F1/0, F2/0] · G[X/001, F1/0, F2/0] G[X/010, F1/1, F2/0] · G[X/011, F1/0, F2/0] G[X/100, F1/1, F2/0] · G[X/101, F1/0, F2/0] G[X/110, F1/1, F2/1] · G[X/111, F1/1, F2/1]
Unique Problem of MIMO Synthesis • MIMO-resynthesis can generate new path in the block • The new path might cause combinational cycles • Conservative solution: detect combinational cycles and discard resynthesis solutions with cycles False path? 3 PI 1 4 PO 5 2 Combinational cycle!
Experimental Settings • Implementation in OAGear • SAT-BM uses miniSAT2.0 • 20 biggest MCNC benchmarks are tested • 10 combinational • 10 sequential • mapped with 4-LUTs by Berkeley ABC • Resynthesis settings • One traversal is performed • Blocks with up to 10 inputs are considered • Results are verified by ABC equivalency checkers
Experimental Settings – PLB templates • All three possible structures for PLBs with up to 10 inputs and less than 4 4-LUTs [Ling, DAC’05] • All intermediate wires are treated as the outputs in MIMO resynthesis
Combinational Resynthesis: MISO vs. MIMO • MIMO does not out-perform MISO significantly, probably due to • Rejecting “false paths” introduced by MIMO resynthesis • Narrow PLB templates • Small block size and LUT size • No iterations of re-synthesis
Outline • Background and Motivation • Combinational Resynthesis with MIMO Blocks • Sequential Resynthesis • Experimental Results • Conclusion and Future Work
Structure Impact on Sequential Resynthesis • The structure of a logic block decides the sequential resynthesis strategies • Retiming • Classic retiming • All edges have non-negative weights after retiming • Peripheral retiming • Result in negative number of FFs at peripheral edges • Logic Duplication • Allow duplication • Not allow duplication
Case I: Classic Retiming w/o Duplication Step1: backward retiming Step2: combinational resynthesis Step3: forward retiming
Case II: Peripheral Retiming w/o Duplication Step1: peripheral retiming Brorrow FFs from outside. Step3: check feasibility of forward retiming A resynthesis solution w/ feasible retiming Step2: combinational resynthesis
Case II: Peripheral Retiming w/o Duplication Step4: forward retiming
Case III: Retiming w/ Duplication FF not movable! Duplication is required to enable retiming! FF# = 0 FF# = 1
Case III: Peripheral Retiming w/ Duplication FF not movable! Identical configuration for LUT-c and LUT-d.
β1 β2 α1+β1 α2+β1 α3+β1 α4+β1 **α3+β2 α4+β2 = 1 0 1 1 * * 0 0 α1 α2 α3 α4 Duplication or Not?– A Sufficient and Necessary Condition • An acyclic block is feasible for retiming w/o duplication iff [Brayton, TCAD’91] • All input-output paths have the same FF# • There exist numbers αi and βj for input i and output j, s.t. FF# in (i,j) path is equal to (αi+βj ) α1 = 1, α2 = 0, α3 = 1, α4 = 1, β1 = 0, β2 = -1
Duplication or Not?– A Sufficient and Necessary Condition • An acyclic block is feasible for retiming w/o duplication iff [Brayton, TCAD’91] • All input-output paths have the same FF# • There exist numbers αi and βj for input i and output j, s.t. FF# in (i,j) path is equal to (αi+βj ) • Time complexity • O(e min(m,n)) • Negligible for small block • Classic or peripheral retiming? • Classic retiming iff there exist non-negative αi and βj
Can We Accept Every Single Resynthesis? – Feasibility Checking for Sequential Resynthesis • Initial State Computation • Filter out some of the rewriting steps so that an equivalent initial state for the synthesized machine can be computed from a given initial state of the original machine. • Rewriting invariant [Brayton, IWLS’07] • Can be reduced to a SAT problem • Clock Period Preservation • A New Retiming-based Technology Mapping Algorithm for LUT-based FPGAs [Pan, FPGA’98] • Sequential arrival time: l-values
Experimental Results – Sequential vs. Combinational Resynthesis • Seq-resynthesis obtains up to 9% area reduction • Factors to affect seq-resynthesis • Sequential structure • All factors in combinational resynthesis
Outline • Background and Motivation • Combinational Resynthesis with MIMO Blocks • SAT-based Boolean Matching for Multiple Output Functions • Resynthesis Algorithm • Sequential Resynthesis • Conclusion and Future Work
Conclusions and Future Work • Proposed a new resynthesis considering bothMIMO blocks and retiming • Results indicate that sequential resynthesis obtainsmore gain than MIMO resynthesis • Future work • PLBs from [Ling, DAC’05] are optimal only for MISO, and we will develop new PLB structures for MIMO re-synthesis • Study the resynthesis for heterogeneous FPGAs
Thanks FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu, Victor Shih, Rupak Majumdar and Lei He