Algorithms for Formal Circuit Optimization on a Static Timing Basis

Algorithms for FormalCircuit Optimization ona Static Timing Basis Chandu Visweswariah IBM Thomas J. Watson Research Center Yorktown Heights, NY

Acknowledgments • Partners in crime • Andy Conn – Ruud Haring • David Ling – Phil Strenski • Chai Wah Wu – Ee Cho • Walt Molzen – Mike Henderson • Katya Scheinberg – Abe Elfadel • Peter O’Brien – Greg Northrop • Pat Williams • several summer students along the way • all the users of our tuners in IBM!

Outline • The benefits of tuning/optimization • Taxonomy • heuristic methods in brief (e.g., TILOS) • Algorithms for formal methods • formulation • incremental sensitivity computation • nonlinear optimization • pruning • optimality conditions • Lagrangian Relaxation (LR) • Methodology implications

Benefits of tuning/optimization • Better circuits • area, delay, power, noise • Enhanced designer productivity • designers’ focus shifts to a higher level like comparing different topologies • Better understanding of tradeoffs • better use of silicon technology • Effective tuning/optimization is an enabler of design for manufacturability • improved parametric yields

Dynamic Static Tuners Heuristic Formal Taxonomy • Today’s tutorial will focus mostly on formal static tuning

Nonlinear optimizer Tuner Simulator Dynamic vs. static tuning • Dynamic tuning is to transient simulation as static tuning is to static timing analysis • examples: DELIGHT.SPICE, JiffyTune • user provides input patterns • user lists paths to be considered • user poses carefully thought-out optimization problem Transistor/wire sizes Function/gradient values

Static tuning • Uses static timing analysis as a basis • All paths through the logic implicitly considered • User does not specify paths or input patterns; easier to use • Inherits weaknesses of static timing analysis • pessimism • false paths • More difficult to include power/noise considerations during tuning

Formal vs. heuristic static tuning • The formal approach attempts to solve the problem “exactly” • better quality of results • long run times • sophisticated mathematical algorithms • significant development effort • The heuristic method repeatedly upsizes the most sensitive transistor(s) on the most critical path(s) • greedy algorithm • generates an area vs. speed tradeoff curve • can handle larger circuits

Heuristic tuning • Example: TILOS (Fishburn and Dunlop, ’85) • Proved that transistor sizing is a convex problem under an Elmore delay assumption • Unfortunately • Elmore delays are too inaccurate • slew effects are not taken into account • meaningful tuning requires additional constraints like slew limits, input loading,  ratio constraints which may not be convex • all of these destroy the convexity of the problem • We give up convexity for accuracy

TILOS algorithm • Start all transistors at minimum size • Determine the critical path • Find the sensitivity of the critical path delay to the width of each transistor on the path • Increase width of transistor with highest negative sensitivity by a fixed step size • Repeat • Generates a speed vs. area tradeoff curve • Can consider more than one transistor and/or more than one path at a time

Most critical Next-most critical Comments on heuristic tuning • The answer can be sub-optimal • Can exploit incremental timing algorithms to determine sensitivities by finite differences • Can use more realistic delay models • Difficult to incorporate additional constraints (loading, slew,  ratio)

Algorithms for formal static tuning • Goals • obtain an optimal answer (local minimum if problem is not convex) • fully automated solution • take all paths into account • delay calculation using time-domain simulation • flexibility to express wide range of constraints • transistor-level tuner • Components • fast simulation and incremental sensitivity • nonlinear optimization package

Static tuning formulation

Digression: minimax optimization • This problem is not smooth! • Re-map to: • This trick converts a minimax problem into a standard twice continuously differentiable nonlinear optimization problem

Digression: minimax optimization

Remapped problem formulation

Reformulated problem • ATs are variables of the problem • At the solution • original problem is solved because by definition z is minimized and the constraints are feasible • ATs on critical path are correct because these constraints are tight • off-critical ATs may be “wrong” • dij’s are functions of transistor widths • Amenable to addition of any additional general (nonlinear) constraints • slew, input loading,  ratio

Springs and planks analogy

Constraint generation • Each delay, slew depends on a singleinput slew i j

Statement of the problem

Simplified view of assertions Combinationalmacro beingtuned Launchinglatch Capturinglatch Primary inputarrival time Primary outputrequired arrivaltime Delaythroughmacro z

Read netlist; create timing graph Formulate optimization problem Feed problem to nonlinear optimizer Snap-to-grid; back-annotate; re-time Solve optimization problem, call simulator for delays/slews and gradients thereof Obtain converged solution Fast simulation and incremental sensitivity computation Components of a tuner

Components of a tuner • Transistor-level static timing analyzer • timing graph, sensitizations, simulation of CCCs, slack calculations • Fast time-domain circuit simulator • Fast time-domain incremental sensitivity • adjoint method • direct method • State-of-the-art nonlinear optimizer • should be able to handle general nonlinear inequalities and objective functions • should be able to handle large problems

Components: fast simulator • Many fast transistor-level time-domain simulators in the literature • orders of magnitude speedup over SPICE • event-driven algorithms • simplified device models (tables for i-v characteristics, simplified parasitic models) • 5% typical, 20% worst-case errors on astage-delay basis • typically invoked via an API by the timer with many “tricks” to ensure efficiency • simulator must handle simulation of CCCs at multiple process corners (best-case, nominal, worst-case)

Gradients are indispensable

Accurate gradients indispensable Go West, young man! Mount Elmore Mount Reality

Gradient computation • Direct method • directly differentiate branch characteristics • any number of functions, one parameter • easily extended to the time-domain • Adjoint method • best approached via Tellegen’s theorem • one function, any number of parameters • requires backward-in-time simulation of the adjoint circuit and convolution of waveforms • Either case • take advantage of event-driven paradigm!

Nonlinear optimization • Much progress in nonlinear optimization in the last two decades • A few state-of-the-art packages available • Can solve large problems (50,000 constraints, 50,000 variables) • Can accommodate general nonlinear objective functions and inequality constraints, and simple bounds on variables • Must exploit partial separability (structure) of the problem to render solution practical

Customization of the optimizer • Do not use the optimizer as a black box! • Customization to circuit tuning yields tremendous benefits • Remember: • any simulation-based scheme involves function and gradient data that are numerically noisy • circuit tuning is simulation-intensive; try to reduce the number of iterations • tune the optimizer to be aggressive • gradient computation is expensive • consider adjoint Lagrangian methods • consider reducing the number of variables

Customization of the optimizer • Examples of optimizer customization: • choose tolerances and stopping criteria based on level of noise • methods to reduce numerical noise • failure recovery when the optimizer is too aggressive • “2-step updates” for accelerated convergence • initialization of variables and multipliers • posing of a well-scaled problem; sensible choices of units • reduction of dimensionality of the problem;e.g., treating fanout capacitances as“internal variables” of the optimization

Outline • The benefits of tuning/optimization • Taxonomy • Heuristic methods in brief (e.g., TILOS) • Algorithms for formal methods • formulation • incremental sensitivity computation • nonlinear optimization • pruning • optimality conditions • Lagrangian Relaxation (LR) • Methodology implications

Springs and planks analogy

Degeneracy! Springs and planks analogy

Problems with formulation • Size (problem with 2,388 gates has 24,768 variables and 19,175 inequality constraints) • Degeneracy • many equally good solutions since off-critical arrival times and slews can take one of several equally correct values • active (tight) constraints have no impact on the final solution (i.e., their Lagrange multiplier = 0) • Redundancy • many constraints can be removed without changing the solution

Definitions (at the solution) • A “normal” constraint is active and has a unique, non-zero multiplier • A degenerate constraint is active and has a zero or non-unique multiplier • A redundant constraint can be removed without changing the solution; it has a unique, zero multiplier, but may or may not be active • Note: all active degenerate constraints are redundant; not all degenerate constraints are redundant

1 2 3 4 Pruning: an example • 3 timing variablesinstead of 9 • 2 nonlinear constraintsinstead of (6 nonlinear + 2 linear) constraints

1 2 3 4 3r 4r 1r 2r Sink 3f 4f 1f 2f 1r 2f, 3r, 4f Sink 2r, 3f, 4r 1f Pruning: an example

5 1 4 2 6 3 Block-based Path-based Block-based & path-based timing

1 5 2 6 3 Block-based & path-based timing • In timing graph, if node has n fanins, m fanouts, eliminating it causes mn constraints instead of (m+n) • Criterion: if mn  (m+n)+1, prune! • Can take slack variable into account 1 5 4 2 6 3

Pruning strategy • During pruning, number of fanins of anyun-pruned node monotonically increases • During pruning, number of fanouts of any un-pruned node monotonically increases • Hence, if a node is not pruned in the first pass, it will never be pruned, since the chances of it getting pruned monotonically worsen • Therefore, a one-pass algorithm can be used for a given pruning criterion

Three-pass greedy pruning • First pass gain=2, then gain=1, then gain=0 • Not optimal, but yields excellent results

Pruning observations • If either m or n is 1, pruning is good! • Even if a node is pruned, its rising/falling slews continue to be variables • Pruning can be done purely topologically • Duplicating nonlinear elements (the dijs) does not increase simulation run time • Duplicating nonlinear elements does not adversely impact the optimization • the gradient/Hessian of each nonlinearelement is only computed and stored once • Three-pass greedy pruning provides tremendous pruning benefits efficiently

1 7 2 9 11 3 14 15 12 4 16 5 10 13 8 6 Detailed pruning example

Edges = 26 Nodes = 16 (+2) Score Card Detailed pruning example 1 7 9 11 14 2 3 15 12 Sink Source 4 8 10 13 16 5 6

Edges = 26  20 Nodes = 16  10 Score Card 1 2 3 4 5 6 Detailed pruning example 7 9 11 14 15 12 Sink Source 8 10 13 16

Edges = 20  17 Nodes = 10  7 Score Card 14 14 15 16 Detailed pruning example 7 9 11 1 2 12 Sink 3 Source 4 5 8 10 13 6

Edges = 17  16 Nodes = 7  6 Score Card 1,7 2,7 3,7 Detailed pruning example 9 11 14 14 12 Sink Source 15 4 5 8 10 13 6 16

Edges = 16  15 Nodes = 6  5 Score Card 11,14 Detailed pruning example 9 1,7 2,7 14 12 Sink Source 3,7 15 4 5 8 10 13 6 16

Edges = 15  14 Nodes = 5  4 Score Card 13,16 Detailed pruning example 9 11,14 1,7 2,7 14 12 Sink Source 3,7 15 4 5 8 10 6

Algorithms for Formal Circuit Optimization on a Static Timing Basis

Algorithms for Formal Circuit Optimization on a Static Timing Basis

Presentation Transcript

Sequential Circuit Timing

Timing Optimization

STATIC TIMING ANALYSIS

13. Sequential Circuit Timing

Running Dynamic Algorithms on Static Hardware

Logic Restructuring for Timing Optimization

Algorithms for Max-min Optimization

Message Passing Algorithms for Optimization

Sequential Circuit Timing

Timing Optimization

Approximation Algorithms for Stochastic Optimization

13. Sequential Circuit Timing

Static Compiler Optimization Techniques

Large-Scale Static Timing Analysis

Static Timing Analysis for Threshold Logic Circuits

Statistical Static Timing Analysis

Sequential Timing Optimization

Metabolic Organisation has a formal basis

Final Project: Static Timing Analysis on GPGPU

Static Compiler Optimization Techniques

Statistical Methods for Optimization Algorithms

OPTIMIZATION BASED ON EVOLUTIONARY ALGORITHMS FOR AERONAUTICS