1 / 73

Algorithms for Formal Circuit Optimization on a Static Timing Basis

Algorithms for Formal Circuit Optimization on a Static Timing Basis. Chandu Visweswariah IBM Thomas J. Watson Research Center Yorktown Heights, NY. Acknowledgments. Partners in crime Andy Conn – Ruud Haring David Ling – Phil Strenski Chai Wah Wu – Ee Cho

marianne
Download Presentation

Algorithms for Formal Circuit Optimization on a Static Timing Basis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithms for FormalCircuit Optimization ona Static Timing Basis Chandu Visweswariah IBM Thomas J. Watson Research Center Yorktown Heights, NY

  2. Acknowledgments • Partners in crime • Andy Conn – Ruud Haring • David Ling – Phil Strenski • Chai Wah Wu – Ee Cho • Walt Molzen – Mike Henderson • Katya Scheinberg – Abe Elfadel • Peter O’Brien – Greg Northrop • Pat Williams • several summer students along the way • all the users of our tuners in IBM!

  3. Outline • The benefits of tuning/optimization • Taxonomy • heuristic methods in brief (e.g., TILOS) • Algorithms for formal methods • formulation • incremental sensitivity computation • nonlinear optimization • pruning • optimality conditions • Lagrangian Relaxation (LR) • Methodology implications

  4. Benefits of tuning/optimization • Better circuits • area, delay, power, noise • Enhanced designer productivity • designers’ focus shifts to a higher level like comparing different topologies • Better understanding of tradeoffs • better use of silicon technology • Effective tuning/optimization is an enabler of design for manufacturability • improved parametric yields

  5. Dynamic Static Tuners Heuristic Formal Taxonomy • Today’s tutorial will focus mostly on formal static tuning

  6. Nonlinear optimizer Tuner Simulator Dynamic vs. static tuning • Dynamic tuning is to transient simulation as static tuning is to static timing analysis • examples: DELIGHT.SPICE, JiffyTune • user provides input patterns • user lists paths to be considered • user poses carefully thought-out optimization problem Transistor/wire sizes Function/gradient values

  7. Static tuning • Uses static timing analysis as a basis • All paths through the logic implicitly considered • User does not specify paths or input patterns; easier to use • Inherits weaknesses of static timing analysis • pessimism • false paths • More difficult to include power/noise considerations during tuning

  8. Formal vs. heuristic static tuning • The formal approach attempts to solve the problem “exactly” • better quality of results • long run times • sophisticated mathematical algorithms • significant development effort • The heuristic method repeatedly upsizes the most sensitive transistor(s) on the most critical path(s) • greedy algorithm • generates an area vs. speed tradeoff curve • can handle larger circuits

  9. Heuristic tuning • Example: TILOS (Fishburn and Dunlop, ’85) • Proved that transistor sizing is a convex problem under an Elmore delay assumption • Unfortunately • Elmore delays are too inaccurate • slew effects are not taken into account • meaningful tuning requires additional constraints like slew limits, input loading,  ratio constraints which may not be convex • all of these destroy the convexity of the problem • We give up convexity for accuracy

  10. TILOS algorithm • Start all transistors at minimum size • Determine the critical path • Find the sensitivity of the critical path delay to the width of each transistor on the path • Increase width of transistor with highest negative sensitivity by a fixed step size • Repeat • Generates a speed vs. area tradeoff curve • Can consider more than one transistor and/or more than one path at a time

  11. Most critical Next-most critical Comments on heuristic tuning • The answer can be sub-optimal • Can exploit incremental timing algorithms to determine sensitivities by finite differences • Can use more realistic delay models • Difficult to incorporate additional constraints (loading, slew,  ratio)

  12. Algorithms for formal static tuning • Goals • obtain an optimal answer (local minimum if problem is not convex) • fully automated solution • take all paths into account • delay calculation using time-domain simulation • flexibility to express wide range of constraints • transistor-level tuner • Components • fast simulation and incremental sensitivity • nonlinear optimization package

  13. Static tuning formulation

  14. Digression: minimax optimization • This problem is not smooth! • Re-map to: • This trick converts a minimax problem into a standard twice continuously differentiable nonlinear optimization problem

  15. Digression: minimax optimization

  16. Remapped problem formulation

  17. Reformulated problem • ATs are variables of the problem • At the solution • original problem is solved because by definition z is minimized and the constraints are feasible • ATs on critical path are correct because these constraints are tight • off-critical ATs may be “wrong” • dij’s are functions of transistor widths • Amenable to addition of any additional general (nonlinear) constraints • slew, input loading,  ratio

  18. Springs and planks analogy

  19. Springs and planks analogy

  20. Constraint generation • Each delay, slew depends on a singleinput slew i j

  21. Statement of the problem

  22. Simplified view of assertions Combinationalmacro beingtuned Launchinglatch Capturinglatch Primary inputarrival time Primary outputrequired arrivaltime Delaythroughmacro z

  23. Read netlist; create timing graph Formulate optimization problem Feed problem to nonlinear optimizer Snap-to-grid; back-annotate; re-time Solve optimization problem, call simulator for delays/slews and gradients thereof Obtain converged solution Fast simulation and incremental sensitivity computation Components of a tuner

  24. Components of a tuner • Transistor-level static timing analyzer • timing graph, sensitizations, simulation of CCCs, slack calculations • Fast time-domain circuit simulator • Fast time-domain incremental sensitivity • adjoint method • direct method • State-of-the-art nonlinear optimizer • should be able to handle general nonlinear inequalities and objective functions • should be able to handle large problems

  25. Components: fast simulator • Many fast transistor-level time-domain simulators in the literature • orders of magnitude speedup over SPICE • event-driven algorithms • simplified device models (tables for i-v characteristics, simplified parasitic models) • 5% typical, 20% worst-case errors on astage-delay basis • typically invoked via an API by the timer with many “tricks” to ensure efficiency • simulator must handle simulation of CCCs at multiple process corners (best-case, nominal, worst-case)

  26. Gradients are indispensable

  27. Accurate gradients indispensable Go West, young man! Mount Elmore Mount Reality

  28. Gradient computation • Direct method • directly differentiate branch characteristics • any number of functions, one parameter • easily extended to the time-domain • Adjoint method • best approached via Tellegen’s theorem • one function, any number of parameters • requires backward-in-time simulation of the adjoint circuit and convolution of waveforms • Either case • take advantage of event-driven paradigm!

  29. Nonlinear optimization • Much progress in nonlinear optimization in the last two decades • A few state-of-the-art packages available • Can solve large problems (50,000 constraints, 50,000 variables) • Can accommodate general nonlinear objective functions and inequality constraints, and simple bounds on variables • Must exploit partial separability (structure) of the problem to render solution practical

  30. Customization of the optimizer • Do not use the optimizer as a black box! • Customization to circuit tuning yields tremendous benefits • Remember: • any simulation-based scheme involves function and gradient data that are numerically noisy • circuit tuning is simulation-intensive; try to reduce the number of iterations • tune the optimizer to be aggressive • gradient computation is expensive • consider adjoint Lagrangian methods • consider reducing the number of variables

  31. Customization of the optimizer • Examples of optimizer customization: • choose tolerances and stopping criteria based on level of noise • methods to reduce numerical noise • failure recovery when the optimizer is too aggressive • “2-step updates” for accelerated convergence • initialization of variables and multipliers • posing of a well-scaled problem; sensible choices of units • reduction of dimensionality of the problem;e.g., treating fanout capacitances as“internal variables” of the optimization

  32. Outline • The benefits of tuning/optimization • Taxonomy • Heuristic methods in brief (e.g., TILOS) • Algorithms for formal methods • formulation • incremental sensitivity computation • nonlinear optimization • pruning • optimality conditions • Lagrangian Relaxation (LR) • Methodology implications

  33. Springs and planks analogy

  34. Degeneracy! Springs and planks analogy

  35. Problems with formulation • Size (problem with 2,388 gates has 24,768 variables and 19,175 inequality constraints) • Degeneracy • many equally good solutions since off-critical arrival times and slews can take one of several equally correct values • active (tight) constraints have no impact on the final solution (i.e., their Lagrange multiplier = 0) • Redundancy • many constraints can be removed without changing the solution

  36. Definitions (at the solution) • A “normal” constraint is active and has a unique, non-zero multiplier • A degenerate constraint is active and has a zero or non-unique multiplier • A redundant constraint can be removed without changing the solution; it has a unique, zero multiplier, but may or may not be active • Note: all active degenerate constraints are redundant; not all degenerate constraints are redundant

  37. 1 2 3 4 Pruning: an example • 3 timing variablesinstead of 9 • 2 nonlinear constraintsinstead of (6 nonlinear + 2 linear) constraints

  38. 1 2 3 4 3r 4r 1r 2r Sink 3f 4f 1f 2f 1r 2f, 3r, 4f Sink 2r, 3f, 4r 1f Pruning: an example

  39. 5 1 4 2 6 3 Block-based Path-based Block-based & path-based timing

  40. 1 5 2 6 3 Block-based & path-based timing • In timing graph, if node has n fanins, m fanouts, eliminating it causes mn constraints instead of (m+n) • Criterion: if mn  (m+n)+1, prune! • Can take slack variable into account 1 5 4 2 6 3

  41. Pruning strategy • During pruning, number of fanins of anyun-pruned node monotonically increases • During pruning, number of fanouts of any un-pruned node monotonically increases • Hence, if a node is not pruned in the first pass, it will never be pruned, since the chances of it getting pruned monotonically worsen • Therefore, a one-pass algorithm can be used for a given pruning criterion

  42. Three-pass greedy pruning • First pass gain=2, then gain=1, then gain=0 • Not optimal, but yields excellent results

  43. Pruning observations • If either m or n is 1, pruning is good! • Even if a node is pruned, its rising/falling slews continue to be variables • Pruning can be done purely topologically • Duplicating nonlinear elements (the dijs) does not increase simulation run time • Duplicating nonlinear elements does not adversely impact the optimization • the gradient/Hessian of each nonlinearelement is only computed and stored once • Three-pass greedy pruning provides tremendous pruning benefits efficiently

  44. 1 7 2 9 11 3 14 15 12 4 16 5 10 13 8 6 Detailed pruning example

  45. Edges = 26 Nodes = 16 (+2) Score Card Detailed pruning example 1 7 9 11 14 2 3 15 12 Sink Source 4 8 10 13 16 5 6

  46. Edges = 26  20 Nodes = 16  10 Score Card 1 2 3 4 5 6 Detailed pruning example 7 9 11 14 15 12 Sink Source 8 10 13 16

  47. Edges = 20  17 Nodes = 10  7 Score Card 14 14 15 16 Detailed pruning example 7 9 11 1 2 12 Sink 3 Source 4 5 8 10 13 6

  48. Edges = 17  16 Nodes = 7  6 Score Card 1,7 2,7 3,7 Detailed pruning example 9 11 14 14 12 Sink Source 15 4 5 8 10 13 6 16

  49. Edges = 16  15 Nodes = 6  5 Score Card 11,14 Detailed pruning example 9 1,7 2,7 14 12 Sink Source 3,7 15 4 5 8 10 13 6 16

  50. Edges = 15  14 Nodes = 5  4 Score Card 13,16 Detailed pruning example 9 11,14 1,7 2,7 14 12 Sink Source 3,7 15 4 5 8 10 6

More Related