1 / 61

The Optimization of High-Performance Digital Circuits

The Optimization of High-Performance Digital Circuits. Andrew Conn (with Michael Henderson and Chandu Visweswariah) IBM Thomas J. Watson Research Center Yorktown Heights, NY. Outline. Circuit optimization. Transistor and wire sizes. Nonlinear optimizer. Function and gradient values.

cachez
Download Presentation

The Optimization of High-Performance Digital Circuits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Optimization of High-Performance Digital Circuits Andrew Conn (with Michael Henderson and Chandu Visweswariah) IBM Thomas J. Watson Research Center Yorktown Heights, NY

  2. Outline

  3. Circuit optimization

  4. Transistor and wire sizes Nonlinear optimizer Function and gradient values Simulator Transistor and wire sizes Nonlinear optimizer Static timing analyzer Function and gradient values Dynamic vs. static optimization

  5. Dynamic vs. static optimization

  6. Custom? High-performance?

  7. Transistor and wire sizes Nonlinear optimizer LANCELOT Function and gradient values Static transistor- level timer EinsTLT Embedded time- domain simulator SPECS EinsTuner: formal static optimizer

  8. 1 4 Read netlist; create timing graph (EinsTLT) Formulate pruned optimization problem Feed problem to nonlinear optimizer (LANCELOT) Snap-to-grid; back-annotate; re-time 3 Solve optimization problem, call simulator for delays/slews and gradients thereof Obtain converged solution Fast simulation and incremental sensitivity computation (SPECS) 2 Components of EinsTuner

  9. Static optimization formulation

  10. Digression: minimax optimization

  11. Remapped problem

  12. Springs and planks analogy

  13. Springs and planks analogy

  14. Red=critical Green=non-critical Curvature=sensitivity Thickness=transistor size Delay Logic stages Gate PIs by criticality Wire Algorithm animation: inv3 • One such frame per iteration

  15. Algorithm demonstration: 1b ALU

  16. Constraint generation i j

  17. Statement of the problem

  18. SPECS: fast simulation • Two orders of magnitude faster than SPICE • 5% typical stage delay and slew accuracy; 20% worst-case • Event-driven algorithm • Simplified device models • Specialized integration methods • Invoked via a programming interface • Accurate gradients indispensable

  19. LANCELOT

  20. LANCELOT algorithms Uses augmented Lagrangian for nonlinear constraints (x,) = f(x) + [ici(x) + ci(x)2 /2] Simple bounds handled explicitly Adds slacks to inequalities Trust region method

  21. Trust-region Simple bounds LANCELOT algorithms continued

  22. Customization of LANCELOT • Cannot just use as a black box • Non-standard options may be preferable • eg Solve the BQP subproblem accurately • Magic Steps • Noise considerations • Structured Secant Updates • Adjoint computations • Preprocessing (Pruning) • Failure recovery in conjunction with SPECS

  23. LANCELOT • State-of-the-art large-scale nonlinear optimization package • Group partial separability is heavily exploited in our formulation • Two-step updates applied to linear variables • Specialized criteria for initializations, updates, adjoint computations, stopping and dealing with numerical noise

  24. Aids to convergence • Initialization of multipliers and variables • Scaling, choice of units • Choice of simple bounds on arrival times, z • Reduction of numerical noise • Reduction of dimensionality • Treating fanout capacitances as“internal variables” of the optimization • Tuning of LANCELOT to be aggressive • Accurate solution of BQP

  25. Demonstration of degeneracy

  26. Degeneracy! Demonstration of degeneracy

  27. Why do we need pruning?

  28. Pruning of the timing graph • The timing graph can be manipulated • to reduce the number of arrival time variables • to reduce the number of timing constraints • most of all, to reduce degeneracy • No loss in generality or accuracy • Bottom line: average 18.3xAT variables,33% variables, 43% timing constraints, 22% constraints, 1.7x to 4.1xin run time on large problems

  29. Pruning strategy • During pruning, number of fanins of any un-pruned node monotonically increases • During pruning, number of fanouts of any un-pruned node monotonically increases • Therefore, if a node is not pruned in the first pass, it will never be pruned • Therefore, a one-pass algorithm can be used for a given pruning criterion

  30. Pruning strategy • The order of pruning provably produces different (possibly sub-optimal) results • Greedy 3-pass pruning produces a “very good” (but perhaps non-optimal) result • We have not been able to demonstrate a better result than greedy 3-pass pruning • However, the quest for a provably optimal solution continues...

  31. 1 2 3 4 Pruning: an example

  32. 5 1 4 2 6 3 Block-based Path-based Block-based vs. path-based timing

  33. 1 1 5 5 4 2 2 6 6 3 3 Block-based & path-based timing • In timing graph, if node has n fanins, m fanouts, eliminating it causes 2mn constraints instead of 2 (m+n) • Criterion: if 2mn  2(m+n)+2, prune!

  34. 1 7 2 9 11 3 14 15 12 4 16 5 10 13 8 6 Detailed pruning example

  35. Edges = 26 Nodes = 16 (+2) Score Card Detailed pruning example 1 7 9 11 14 2 3 15 12 Sink Source 4 8 10 13 16 5 6

  36. Edges = 26  20 Nodes = 16  10 Score Card 1 2 3 4 5 6 Detailed pruning example 7 9 11 14 15 12 Sink Source 8 10 13 16

  37. Edges = 20  17 Nodes = 10  7 Score Card 14 14 15 16 Detailed pruning example 7 9 11 1 2 12 Sink 3 Source 4 5 8 10 13 6

  38. Edges = 17  16 Nodes = 7  6 Score Card 1,7 2,7 3,7 Detailed pruning example 9 11 14 14 12 Sink Source 15 4 5 8 10 13 6 16

  39. Edges = 16  15 Nodes = 6  5 Score Card 11,14 Detailed pruning example 9 1,7 2,7 14 12 Sink Source 3,7 15 4 5 8 10 13 6 16

  40. Edges = 15  14 Nodes = 5  4 Score Card 13,16 Detailed pruning example 9 11,14 1,7 2,7 14 12 Sink Source 3,7 15 4 5 8 10 6

  41. Edges = 14  13 Nodes = 4  3 Score Card 10 10,13,16 Detailed pruning example 9 11,14 1,7 2,7 14 12 Sink Source 3,7 15 4 5 8 6

  42. Edges = 13  13 Nodes = 3  2 Score Card 12,14 12,15 10,12,14 10,12,15 Edges: 26 to 13 (2x) Nodes: 16 to 2 (8x) Detailed pruning example 9 11,14 1,7 2,7 Sink Source 3,7 4 5 8 6 10,13,16

  43. Pruning vs. no pruning

  44. Adjoint Lagrangian mode • gradient computation is the bottleneck • if the problem has m measurements and n tunable transistor/wire sizes: • traditional direct method: n sensitivity simulations • traditional adjoint method: m adjoint simulations • adjoint Lagrangian method computes all gradients in a single adjoint simulation!

  45. Adjoint Lagrangian mode • useful for large circuits • implication: additional timing/noise constraints at no extra cost! • is predicated on close software integration between the optimizer and the simulator • gradient computation is 8% of total run timeon average

  46. v area = c(x) NML t2 t1 t v ( x , t )  NM for all t in [ t , t ] 1 2 L Noise considerations • noise is important during tuning • semi-infinite problem

  47. Noise considerations area = c(x) v • Trick: remap infinite number of constraints to a single integral constraint c(x) = 0 • In adjoint Lagrangian mode, any number of noise constraints almost for free! • General (constraints, objectives, minimax) • Tradeoff analysis for dynamic library cells NML t t1 t2

  48. Noise considerations

  49. 1/6 1/6 1/2 1/6 1/2 1/4 1/4 Initialization of s

  50. Some numerical results - Dynamic

More Related