200 likes | 346 Views
Timing Driven Gate Duplication: Complexity Issues and Algorithms. Ankur Srivastava, Ryan Kastner and Majid Sarrafzadeh E mbedded & R econfigurable System Design ER-Group UCLA. Motivation.
E N D
Timing Driven Gate Duplication: Complexity Issues and Algorithms Ankur Srivastava, Ryan Kastner and Majid Sarrafzadeh Embedded & Reconfigurable System Design ER-Group UCLA
Motivation • Need for new methodologies of delay improvement in the light of the stringent timing constraint that designers have • Gate duplication has been studied primarily for cut-set minimization. Applicability of this method for improving delay has not been studied by the research community
Load Dependent Delay Model (LDDM) i i i i j j (i) = i + i * COUT wire-delays are assumed to be zero j j
Gate Duplication for Delay Improvement A C B r = 2 = 5 r = 2 = 5 r = 2 = 5 = 1 = 1 = 0.1 CD = 15 D r = Input pin required time = required time at O/P - gate delay r = -14 CE = 0.1 E r = -15.1
Gate Duplication for Delay Improvement CD’ = 5 CD = 10 D’ D CE = 0.2 C A B r = 2 = 5 r = 2 = 5 r = 2 = 5 = 1 = 1 = 0.1 r = -9 E r = -10.2
Complexity Issues • Theorem: Global Gate Duplication is NP-Complete in LDDM • MONO3SAT gets transformed to an instance of the global problem • Theorem: Local Gate Duplication is NP-Complete • PARTITION problem gets transformed to an instance of the local problem
Complexity Issues (Comparison with Buffer Insertion) • Local Buffer Insertion Problem: Polynomially Solvable if the net topology is fixed. • Global Buffer Insertion Problem: Polynomially solvable if the delay model has same pin to pin parameters • Situations in which buffer insertion is polynomially solvable, Gate Duplication becomes NP-Complete
Algorithm for Gate Duplication • Based on the structure of dynamic programming • Applies duplication to all the gates in the circuit. Hence works in the pro-active mode • Assumption: The circuit has only single output combinational gates.
Algorithm for Gate Duplication • Stage1: Traverse the network from POs to PIs in the topological order evaluating tuples at every step • Stage2: Now traverse the network from PI to PO in topological order deciding the gates to be duplicated • Stage3: Traverse the network from PO to PI physically duplicating the gates
Stage 1: g’ i g g i’ Need to find the best duplication strategy of the fanouts such that the input pin required time is maximized i tup(i,g).dup.r_small tup(i,g).dup.r_large tup(i,g).nodup
Stage 1: g’ i g g i’ Need to find the best duplication strategy of the fanouts and the best fanout partitioning between g and g’ such that the input pin required time is maximized i tup(i,g).dup.r_small tup(i,g).dup.r_large tup(i,g).nodup
Stage 1: • NODUP: Sort the fanouts and duplicate in that order. (total n+1 duplication strategies) RESULT: This Algorithm is optimal g g
Stage 1: • DUP: g’ g’ g g
Stage 2: 1 1 1 0 • Stage2: Forward traversal in topo sorted order 1 0
Stage 3: • Stage 3: Traverse the circuit backwards from PO to PI, physically duplicating the gates
Experimental Results • The circuit was first optimized using script.rugged of SIS followed by speed_up • Results obtained in two categories, one with minimum delay technology mapping map -n 1, other with minimum delay technology mapping with fanout optimization map -n 1 -AFG
Conclusion • We presented an algorithm for gate duplication and showed it’s effectiveness in reducing circuit delay, both with and without buffer insertion • We proved the local problem NP-Complete • The future work would include the extension of this algorithm in a layout driven framework.
Timing Driven Gate Duplication: Complexity Issues and Algorithms Ankur Srivastava, Ryan Kastner and Majid Sarrafzadeh Embedded & Reconfigurable System Design ER-Group UCLA