1 / 20

Timing Driven Gate Duplication: Complexity Issues and Algorithms

This paper discusses the complexity issues and algorithms related to gate duplication for delay improvement, including load dependent delay models and different strategies for gate duplication. It also compares gate duplication with buffer insertion and presents experimental results.

Download Presentation

Timing Driven Gate Duplication: Complexity Issues and Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Timing Driven Gate Duplication: Complexity Issues and Algorithms Ankur Srivastava, Ryan Kastner and Majid Sarrafzadeh Embedded & Reconfigurable System Design ER-Group UCLA

  2. Motivation • Need for new methodologies of delay improvement in the light of the stringent timing constraint that designers have • Gate duplication has been studied primarily for cut-set minimization. Applicability of this method for improving delay has not been studied by the research community

  3. Load Dependent Delay Model (LDDM) i i i i j j (i) = i + i * COUT wire-delays are assumed to be zero j j

  4. Gate Duplication for Delay Improvement A C B r = 2  = 5 r = 2  = 5 r = 2  = 5  = 1  = 1  = 0.1 CD = 15 D r = Input pin required time = required time at O/P - gate delay r = -14 CE = 0.1 E r = -15.1

  5. Gate Duplication for Delay Improvement CD’ = 5 CD = 10 D’ D CE = 0.2 C A B r = 2  = 5 r = 2  = 5 r = 2  = 5  = 1  = 1  = 0.1 r = -9 E r = -10.2

  6. Complexity Issues • Theorem: Global Gate Duplication is NP-Complete in LDDM • MONO3SAT gets transformed to an instance of the global problem • Theorem: Local Gate Duplication is NP-Complete • PARTITION problem gets transformed to an instance of the local problem

  7. Complexity Issues (Comparison with Buffer Insertion) • Local Buffer Insertion Problem: Polynomially Solvable if the net topology is fixed. • Global Buffer Insertion Problem: Polynomially solvable if the delay model has same pin to pin parameters • Situations in which buffer insertion is polynomially solvable, Gate Duplication becomes NP-Complete

  8. Algorithm for Gate Duplication • Based on the structure of dynamic programming • Applies duplication to all the gates in the circuit. Hence works in the pro-active mode • Assumption: The circuit has only single output combinational gates.

  9. Algorithm for Gate Duplication • Stage1: Traverse the network from POs to PIs in the topological order evaluating tuples at every step • Stage2: Now traverse the network from PI to PO in topological order deciding the gates to be duplicated • Stage3: Traverse the network from PO to PI physically duplicating the gates

  10. Stage 1: g’ i g g i’ Need to find the best duplication strategy of the fanouts such that the input pin required time is maximized i tup(i,g).dup.r_small tup(i,g).dup.r_large tup(i,g).nodup

  11. Stage 1: g’ i g g i’ Need to find the best duplication strategy of the fanouts and the best fanout partitioning between g and g’ such that the input pin required time is maximized i tup(i,g).dup.r_small tup(i,g).dup.r_large tup(i,g).nodup

  12. Stage 1: • NODUP: Sort the fanouts and duplicate in that order. (total n+1 duplication strategies) RESULT: This Algorithm is optimal g g

  13. Stage 1: • DUP: g’ g’ g g

  14. Stage 2: 1 1 1 0 • Stage2: Forward traversal in topo sorted order 1 0

  15. Stage 3: • Stage 3: Traverse the circuit backwards from PO to PI, physically duplicating the gates

  16. Experimental Results • The circuit was first optimized using script.rugged of SIS followed by speed_up • Results obtained in two categories, one with minimum delay technology mapping map -n 1, other with minimum delay technology mapping with fanout optimization map -n 1 -AFG

  17. Experimental Results (map -n 1)

  18. Experimental Results (map -n 1 -AFG)

  19. Conclusion • We presented an algorithm for gate duplication and showed it’s effectiveness in reducing circuit delay, both with and without buffer insertion • We proved the local problem NP-Complete • The future work would include the extension of this algorithm in a layout driven framework.

  20. Timing Driven Gate Duplication: Complexity Issues and Algorithms Ankur Srivastava, Ryan Kastner and Majid Sarrafzadeh Embedded & Reconfigurable System Design ER-Group UCLA

More Related