ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

ε-Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National Taiwan University) University of Wisconsin-Madisonhttp://vlsi.ece.wisc.edu

Outline • Background • Motivation and contribution • Literature overview • ClockTune algorithm • Problem formulation • ClockTune algorithm overview • Optimality and complexity analysis • Experimental results • Runtime, memory usage, and optimality • Power/Delay trade-off • Incremental refinement

Motivation • Clock skew  cycle time penalty • Start with zero-skew clock tree • Minimize clock delay reduces system-level skew (Kuh, et al. [DAC ‘90]) • Clock tree is power-hungry (30% in Intel McKinley(0.18um/1GHz/130W) • P = f CV2 • Minimize switching capacitance (wiring area) • Stability affects design convergence • Allow incremental refinement to accommodate local changes • Interconnect delay dominates total delay • Wire-sizing is effective in reducing interconnect delay

Motivation • Non-convex zero-skew constraints • No known algorithm solves zero-skew wire-sizing problem optimally with polynomial runtime • Hence, a good clock tree wire-sizing algorithm can • Minimize delay and power • Guarantee optimality and runtime • Have good stability

Contribution • First ε-optimal algorithm for solving clock min-delay/power zero-skew wire-sizing optimization problem • Provide complete (Sampled) solution set of the delay/power/area trade-off information for design planning • Efficient pseudo-polynomial runtime (6170-branch clock tree in 6 minutes within 1% optimality) • Runtime v.s. Optimality tradeoff • Incremental clock re-balancing to speed up design convergence

Literature Overview • “Reliable non-zero skew clock tree using wire width optimization”, Pillage, et al. [DAC ’93] • Iteratively optimize skew and delay using adjoint sensitivity analysis • Aimed at reliable clock trees under process variation • Deferred Merging Embedding (DME) algorithm, Kahng, et al. [TCAD ’92] • Bottom-up merging segment construction, top-down embedding • Integrated Deferred Merging Embedding (IDME) algorithm, Wong, et al. [ISPD’00] • Handles simultaneous routing, buffer-insertion, and wire-sizing • Merging segment set: a set of line samples of a merging region • No optimality guarantee • The size of MSS grows exponentially • “Process variation aware clock tree routing”, Lu, et al. [ISPD ’03] • Based on DME/BST

Outline • Background • Motivation and contribution • Literature overview • ClockTune algorithm • Problem formulation • ClockTune algorithm overview • Optimality and complexity analysis • Experimental results • Runtime, memory usage, and optimality • Power/Delay trade-off • Incremental refinement

Problem formulation • min-ZSWS (Zero Skew Wire Sizing) problem • Given a clock routing minimize s.t. where Pi, Pj are paths from v to leaf nodes i and j • Zero-skew constraints are non-convex constraints • No known algorithm solves the problem optimally in polynomial runtime

DC region DC region approach • Clock Delay and wiring Capacitance are top concerns • Define f : RNR2, such that • fY(w) = Delay(Tv(w)), fX(w) = Capacitance(Tv(w)) • DC region (v):The projection of the feasible region • Choose a d-c pair from the DC region on R2 Feasible region

ClockTune algorithm overview • Phase 1: bottom-up construct DC regions for every node • Phase 2: top-down embedding after delay/power tradeoff

Optimality analysis • Embeddings not fall on the delay samples will be omitted • Propagated error • Delay sampling error • Wire width sampling error (detailed in the paper)

Optimality analysis • Error is bounded • d : delay sampling resolution • w : wire width sampling resolution • k,  : Constants related to l, r0, c0, wm, wM … • Generally speaking, error reduced about a half when resolution doubled Error Resolution

Optimality runtime trade off • Control sampling resolution can trade off optimality with runtime and memory

Complexity analysis • Runtime • Bottom-up phase takes O(n p max(p,q)) • Top-down phase takes O(np) • Overall: O(n p max(p,q)) • Memory • O(np) where n : number of nodes of the clock tree, p : number of delay samples taken at each node q : number of wire width samples taken at each level-2 node

Outline • Background • Motivation and contribution • Related works • problem formulation • ClockTune Algorithm • Design space projection • Algorithm overview • Optimality and complexity analysis • Experimental Results • Runtime, memory usage, and optimality • Power/Delay trade-off • Incremental refinement

Experimental setup • ClockTune is implemented in C++, executed on a 128MB 533MHz Pentium III PC • Benchmarks r1 – r5 from Tsay et al. [ICCAD‘91] • Initial routing generated by BB+DME algorithm with minimum wire width w = 1 m • ClockTune uses wm = 1 m, wM = 4 m • p: number of delay samples taken at every node • q: number of wire width samples taken at every level-2 node • r0 = 0.03, c0 = 210-16/m2

Runtime and memory usage • Runtime and memory usage are linear to problem size when p, q are fixed • Within 1% optimality when p,q=256 (runtime < 6 minutes, memory ~ 64MB)

Optimality results • Optimality • Error below 1% with p=q=256 • Error reduced to about a half when resolution doubled

Power/Delay trade-off 5~150ns Delay r5 Minimum power 0.2~1.1nF Minimum delay Capacitance 15:1 delay:power trade-off

Incremental refinement • DC region captures the design space • Enables incremental refinement

Conclusion & Future Work • Provide a zero-skew clock tree wire-sizing algorithm which • Minimizes delay and area ε-optimally • Guarantees pseudo-polynomial runtime and memory usage • Provides delay/power trade-off information to designers • Speeds up design convergence by allowing clock tree re-balancing with minimum changes • Better delay model • Buffer insertion/sizing capability

Thank you !

ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

Presentation Transcript

ISSUES IN TIMING

Signal and Timing Parameters I Common Clock – Class 2

Outline

Clock Skew

Introduction to CMOS VLSI Design Lecture 19: Design for Skew

A Polynomial Space and Polynomial Delay Algorithm for Enumeration of Maximal Motifs in a Sequence

Topics

Optimal Clock Synchronization in Networks

Basic Electricity

Clock Design

Zero Skew Clock Tree Implementation ─ The Delay Model

WEP Co-Processor Project

Clock Skew

Minimal Skew Clock Embedding Considering Time-Variant Temperature Gradient

Minimum Spanning Tree

Optimal Oblivious Routing in Polynomial Time

Lecture 22: PLLs and DLLs

Zero-Skew Trees

9 What time is it?

CS137: Electronic Design Automation

Maze Routing with Buffer Insertion and Wire sizing

MA/CSSE 473 Days 31-32