1 / 22

ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time. Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National Taiwan University). University of Wisconsin-Madison http://vlsi.ece.wisc.edu. Outline. Background Motivation and contribution

ernestinac
Download Presentation

ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ε-Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National Taiwan University) University of Wisconsin-Madisonhttp://vlsi.ece.wisc.edu

  2. Outline • Background • Motivation and contribution • Literature overview • ClockTune algorithm • Problem formulation • ClockTune algorithm overview • Optimality and complexity analysis • Experimental results • Runtime, memory usage, and optimality • Power/Delay trade-off • Incremental refinement

  3. Motivation • Clock skew  cycle time penalty • Start with zero-skew clock tree • Minimize clock delay reduces system-level skew (Kuh, et al. [DAC ‘90]) • Clock tree is power-hungry (30% in Intel McKinley(0.18um/1GHz/130W) • P = f CV2 • Minimize switching capacitance (wiring area) • Stability affects design convergence • Allow incremental refinement to accommodate local changes • Interconnect delay dominates total delay • Wire-sizing is effective in reducing interconnect delay

  4. Motivation • Non-convex zero-skew constraints • No known algorithm solves zero-skew wire-sizing problem optimally with polynomial runtime • Hence, a good clock tree wire-sizing algorithm can • Minimize delay and power • Guarantee optimality and runtime • Have good stability

  5. Contribution • First ε-optimal algorithm for solving clock min-delay/power zero-skew wire-sizing optimization problem • Provide complete (Sampled) solution set of the delay/power/area trade-off information for design planning • Efficient pseudo-polynomial runtime (6170-branch clock tree in 6 minutes within 1% optimality) • Runtime v.s. Optimality tradeoff • Incremental clock re-balancing to speed up design convergence

  6. Literature Overview • “Reliable non-zero skew clock tree using wire width optimization”, Pillage, et al. [DAC ’93] • Iteratively optimize skew and delay using adjoint sensitivity analysis • Aimed at reliable clock trees under process variation • Deferred Merging Embedding (DME) algorithm, Kahng, et al. [TCAD ’92] • Bottom-up merging segment construction, top-down embedding • Integrated Deferred Merging Embedding (IDME) algorithm, Wong, et al. [ISPD’00] • Handles simultaneous routing, buffer-insertion, and wire-sizing • Merging segment set: a set of line samples of a merging region • No optimality guarantee • The size of MSS grows exponentially • “Process variation aware clock tree routing”, Lu, et al. [ISPD ’03] • Based on DME/BST

  7. Outline • Background • Motivation and contribution • Literature overview • ClockTune algorithm • Problem formulation • ClockTune algorithm overview • Optimality and complexity analysis • Experimental results • Runtime, memory usage, and optimality • Power/Delay trade-off • Incremental refinement

  8. Problem formulation • min-ZSWS (Zero Skew Wire Sizing) problem • Given a clock routing minimize s.t. where Pi, Pj are paths from v to leaf nodes i and j • Zero-skew constraints are non-convex constraints • No known algorithm solves the problem optimally in polynomial runtime

  9. DC region DC region approach • Clock Delay and wiring Capacitance are top concerns • Define f : RNR2, such that • fY(w) = Delay(Tv(w)), fX(w) = Capacitance(Tv(w)) • DC region (v):The projection of the feasible region • Choose a d-c pair from the DC region on R2 Feasible region

  10. ClockTune algorithm overview • Phase 1: bottom-up construct DC regions for every node • Phase 2: top-down embedding after delay/power tradeoff

  11. Optimality analysis • Embeddings not fall on the delay samples will be omitted • Propagated error • Delay sampling error • Wire width sampling error (detailed in the paper)

  12. Optimality analysis • Error is bounded • d : delay sampling resolution • w : wire width sampling resolution • k,  : Constants related to l, r0, c0, wm, wM … • Generally speaking, error reduced about a half when resolution doubled Error Resolution

  13. Optimality runtime trade off • Control sampling resolution can trade off optimality with runtime and memory

  14. Complexity analysis • Runtime • Bottom-up phase takes O(n p max(p,q)) • Top-down phase takes O(np) • Overall: O(n p max(p,q)) • Memory • O(np) where n : number of nodes of the clock tree, p : number of delay samples taken at each node q : number of wire width samples taken at each level-2 node

  15. Outline • Background • Motivation and contribution • Related works • problem formulation • ClockTune Algorithm • Design space projection • Algorithm overview • Optimality and complexity analysis • Experimental Results • Runtime, memory usage, and optimality • Power/Delay trade-off • Incremental refinement

  16. Experimental setup • ClockTune is implemented in C++, executed on a 128MB 533MHz Pentium III PC • Benchmarks r1 – r5 from Tsay et al. [ICCAD‘91] • Initial routing generated by BB+DME algorithm with minimum wire width w = 1 m • ClockTune uses wm = 1 m, wM = 4 m • p: number of delay samples taken at every node • q: number of wire width samples taken at every level-2 node • r0 = 0.03, c0 = 210-16/m2

  17. Runtime and memory usage • Runtime and memory usage are linear to problem size when p, q are fixed • Within 1% optimality when p,q=256 (runtime < 6 minutes, memory ~ 64MB)

  18. Optimality results • Optimality • Error below 1% with p=q=256 • Error reduced to about a half when resolution doubled

  19. Power/Delay trade-off 5~150ns Delay r5 Minimum power 0.2~1.1nF Minimum delay Capacitance 15:1 delay:power trade-off

  20. Incremental refinement • DC region captures the design space • Enables incremental refinement

  21. Conclusion & Future Work • Provide a zero-skew clock tree wire-sizing algorithm which • Minimizes delay and area ε-optimally • Guarantees pseudo-polynomial runtime and memory usage • Provides delay/power trade-off information to designers • Speeds up design convergence by allowing clock tree re-balancing with minimum changes • Better delay model • Buffer insertion/sizing capability

  22. Thank you !

More Related