110 likes | 125 Views
This paper presents an innovative method for delay optimization in AIG-based logic synthesis, leveraging structural choices to improve speed and efficiency. By reducing repeated timing analysis and utilizing technology mapping, the proposed approach offers a simple and effective solution for critical timing areas. Experimental results demonstrate significant improvements in runtime and delay using the new method. Future work may involve extending the algorithm for sequential circuits and exploring optimization for different cost functions.
E N D
Global Delay Optimization using Structural Choices Alan Mishchenko Robert Brayton UC Berkeley Stephen Jang Xilinx Inc.
Overview • Motivation • Timing criticality • Restructuring for delay • Algorithm • Experimental results • Conclusions • Future work
Motivation • AIG is an And-Inverter Graph • AIG-based combinational logic synthesis is fast and effective • AIG-based synthesis is area-oriented (except balancing) • Needed: Delay optimization in AIG-based synthesis • AIGs allow for accumulation of structural choices [Lehman et al, TCAD’97; Chatterjee et al, ICCAD’05] • Can leverage efficient technology mapper with choices • Can lead to fast delay optimization (~10% of mapping time)
Distinctive Features • Traditional approach • For all timing-critical areas • Perform timing analysis • Generate alternative structures • Evaluate the improvement and decide is transformation is accepted • Proposed approach • Perform timing analysis only once • For all timing-critical areas • Generate and store structural choices • Use technology mapper to pick and choose good structures • Characteristics of the proposed approach • Fast – because there is no repeated timing analysis • Simple – because it leverages AIG package and LUT mapper • Effective – because it makes decision in the global space
Timing Criticality • Critical nodes • Used by many traditional algorithms • Critical edges • Used by our algorithm • We pre-compute critical edges of critical nodes • Reduces computation • An edge between critical nodes may not be critical • See illustration: edge 13 Primary outputs 4 4 3 3 2 2 1 1 Primary inputs
Delay-Oriented Restructuring • Using traditional MUX-restructuring • AKA generalized select transform
Overall Algorithm mapped netlist performSpeedup ( subject graph S, // S is an And-Inverter Graph mapped netlist M, // M was previously derived by tech-mapping of S timing window w, // w is used to detect the critical paths logic depth l, // l is used to detect a logic cone rooted at a node edge count p ) // p limits the number critical edges of the cone { perform timing analysis of M with unit-delay or LUT-library model; pre-compute critical section of M as nodes n such that 0 slack(n) w; pre-compute timing-critical edges connecting these nodes; for each timing critical node n { find cone C of M that extends l levels down from n; pick the set of timing-critical edges V feeding into C; if the number of edges in V exceeds p, continue; find logic cone C’ in S corresponding to C in M; find variables V’ in S corresponding to V in M; derive cofactors of the function of C’ w.r.t. variables in V’; build multiplexer tree C’’ of the cofactors using variables in V’; add structural choice C’= C’’ to the subject graph S; } returnmapped netlist M’ derived by mapping subject graph S with added choices; }
Experimental Setup • Implemented in ABC as command speedup • Used FPGA technology mapper if • Verified the results using CEC engine cec • Experiments targeting 6-LUTs were run on an Intel Xeon 2-CPU 4-core computer with 8Gb RAM. • Experimentally compared the following scripts • Without delay-optimization: • (st; dchoice; if -C 16 -F 2)8 • With delay-optimization: • (st; dchoice; if -C 16 -F 2)4 • (speedup; if -C 16 -F 2)3 • (st; dchoice; if -C 16 -F 2)4
Examples of LUT Libraries The unit-delay LUT library 1 1.0 1.0 2 1.0 1.0 1.0 3 1.0 1.0 1.0 1.0 4 1.0 1.0 1.0 1.0 1.0 5 1.0 1.0 1.0 1.0 1.0 1.0 6 1.0 1.0 1.0 1.0 1.0 1.0 1.0 A variable-pin-delay LUT library 1 1.0 0.2 2 1.0 0.2 0.3 3 1.0 0.2 0.3 0.4 4 1.0 0.2 0.3 0.4 0.45 5 1.0 0.2 0.3 0.4 0.45 0.55 6 1.0 0.2 0.3 0.4 0.45 0.55 0.65 A variable-pin-delay LUT library with wire-delays 1 1.0 0.4 2 1.0 0.4 0.5 3 1.0 0.4 0.5 0.6 4 1.0 0.4 0.5 0.6 0.65 5 1.0 0.4 0.5 0.6 0.65 0.75 6 1.0 0.4 0.5 0.6 0.65 0.75 0.85 LUT size LUT area LUT pin delays
Experimental Results Time1 – the runtime of AIG restructuring only Time2 – the total runtime of Speeup Geomean – geometric averages of columns Ratios – ratios of geometric averages LUT – number of LUTs Lev – number of LUT levels Delay – delay using LUT library Total – total runtime of Baseline
Conclusions and Future Work • Developed a method that is • Fast – because there is no repeated timing analysis • Simple – because it leverages AIG package and LUT mapper • Effective – because it makes decision in the global space • Future work may include • measuring improvements after place-and-route • extending the algorithm to work for sequential circuits • applying similar optimization for cost functions other than delay (e.g. switching activity minimization)