Timing Driven Gate Duplication: Complexity Issues and Algorithms

Timing Driven Gate Duplication: Complexity Issues and Algorithms Ankur Srivastava, Ryan Kastner and Majid Sarrafzadeh Embedded & Reconfigurable System Design ER-Group UCLA

Motivation • Need for new methodologies of delay improvement in the light of the stringent timing constraint that designers have • Gate duplication has been studied primarily for cut-set minimization. Applicability of this method for improving delay has not been studied by the research community

Load Dependent Delay Model (LDDM) i i i i j j (i) = i + i * COUT wire-delays are assumed to be zero j j

Gate Duplication for Delay Improvement A C B r = 2  = 5 r = 2  = 5 r = 2  = 5  = 1  = 1  = 0.1 CD = 15 D r = Input pin required time = required time at O/P - gate delay r = -14 CE = 0.1 E r = -15.1

Gate Duplication for Delay Improvement CD’ = 5 CD = 10 D’ D CE = 0.2 C A B r = 2  = 5 r = 2  = 5 r = 2  = 5  = 1  = 1  = 0.1 r = -9 E r = -10.2

Complexity Issues • Theorem: Global Gate Duplication is NP-Complete in LDDM • MONO3SAT gets transformed to an instance of the global problem • Theorem: Local Gate Duplication is NP-Complete • PARTITION problem gets transformed to an instance of the local problem

Complexity Issues (Comparison with Buffer Insertion) • Local Buffer Insertion Problem: Polynomially Solvable if the net topology is fixed. • Global Buffer Insertion Problem: Polynomially solvable if the delay model has same pin to pin parameters • Situations in which buffer insertion is polynomially solvable, Gate Duplication becomes NP-Complete

Algorithm for Gate Duplication • Based on the structure of dynamic programming • Applies duplication to all the gates in the circuit. Hence works in the pro-active mode • Assumption: The circuit has only single output combinational gates.

Algorithm for Gate Duplication • Stage1: Traverse the network from POs to PIs in the topological order evaluating tuples at every step • Stage2: Now traverse the network from PI to PO in topological order deciding the gates to be duplicated • Stage3: Traverse the network from PO to PI physically duplicating the gates

Stage 1: g’ i g g i’ Need to find the best duplication strategy of the fanouts such that the input pin required time is maximized i tup(i,g).dup.r_small tup(i,g).dup.r_large tup(i,g).nodup

Stage 1: g’ i g g i’ Need to find the best duplication strategy of the fanouts and the best fanout partitioning between g and g’ such that the input pin required time is maximized i tup(i,g).dup.r_small tup(i,g).dup.r_large tup(i,g).nodup

Stage 1: • NODUP: Sort the fanouts and duplicate in that order. (total n+1 duplication strategies) RESULT: This Algorithm is optimal g g

Stage 1: • DUP: g’ g’ g g

Stage 2: 1 1 1 0 • Stage2: Forward traversal in topo sorted order 1 0

Stage 3: • Stage 3: Traverse the circuit backwards from PO to PI, physically duplicating the gates

Experimental Results • The circuit was first optimized using script.rugged of SIS followed by speed_up • Results obtained in two categories, one with minimum delay technology mapping map -n 1, other with minimum delay technology mapping with fanout optimization map -n 1 -AFG

Experimental Results (map -n 1)

Experimental Results (map -n 1 -AFG)

Conclusion • We presented an algorithm for gate duplication and showed it’s effectiveness in reducing circuit delay, both with and without buffer insertion • We proved the local problem NP-Complete • The future work would include the extension of this algorithm in a layout driven framework.

Timing Driven Gate Duplication: Complexity Issues and Algorithms Ankur Srivastava, Ryan Kastner and Majid Sarrafzadeh Embedded & Reconfigurable System Design ER-Group UCLA

Timing Driven Gate Duplication: Complexity Issues and Algorithms

Timing Driven Gate Duplication: Complexity Issues and Algorithms

Presentation Transcript

An Overview of Pitch Detection Algorithms

Digital Logic Design I Gate-Level Minimization

Feature Driven Development

NP-completeness

Tissue Repair

Fixed Parameter Complexity

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Combinatorial Algorithms

Computational Complexity:

Parallel Programming and Timing Analysis on Embedded Multicores

Algorithms

Timing Analysis in Quartus

Machine consciousness and complexity

Chapter 2

ENG6530 Reconfigurable Computing Systems

Complexity and pain

Combinatorial Algorithms

Information Extraction with GATE Kalina Bontcheva (University of Sheffield) ‏

CT455: Computer Organization Logic gate

Genetic Algorithms

Meeting the Text Complexity Demands of the Common Core

CS301 - Algorithms