1 / 26

Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages

Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages. King Ho Tam and Lei He Electrical Engineering Department University of California, Los Angeles Sponsors: NSF CAREER, UC MICRO (Fujitsu, Intel and Mindspeed), and IBM Faculty Partner Award. Motivation.

temple
Download Presentation

Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Power Optimal Dual-Vdd Buffered Tree Considering Buffer Stations and Blockages King Ho Tam and Lei He Electrical Engineering Department University of California, Los Angeles Sponsors: NSF CAREER, UC MICRO (Fujitsu, Intel and Mindspeed), and IBM Faculty Partner Award.

  2. Motivation • Increasing interconnect power • 35% cells are buffers at 65nm technology [Saxena, TCAD 04] • Previous work • Power-optimal single Vdd buffer insertion[Lillis, JSSC 96] • Delay-optimal buffered tree generation[Cong, DAC 00; Alpert, TCAD 02] • No existing algorithms consider dual-Vdd for buffer insertion or buffered tree generation

  3. Major Contributions • First in-depth study of dual Vdd buffer insertion and buffered tree generation • Large power saving over single Vdd buffering • Efficient algorithms for power optimality • 17x faster than [Lillis, JSSC 96] when single Vdd is considered

  4. Outline • Dual Vdd buffer insertion and sizing (DVB) • Problem formulation • Sampling for speedup • Experimental results • Dual Vdd buffered tree generation (D-Tree) • Problem formulation • Improved augmented orthogonal search tree • Experimental results

  5. Delay, Slew and Power Modeling • Elmore delay • Wire: , buffer: • Bakoglu’s slew metric (ln 9 ∙Elmore) • Power = energy per switch • Wire: • Lumped buffer dynamic/short-circuit power • Can be easily extended to leakage power • Low Vdd (VL) reduces leakage • Need to assume of clock rate and switching activity

  6. V VH I Reduced noise margin V V VH VL I Leakage Introducing Dual Vdd Buffering • Achieves power saving since power α Vdd2 • Suffer no loss of delay optimality • VL => VH requires level converter (LC) • Restore voltage level and reduce leakage • Ext-CVS for logic [Srivastava, ISLPED 04] • LC delay and power overhead amortized

  7. Key Observation in Dual Vdd Buffering • Disallowing VL => VH will not affect optimality • Optimality empirically illustrated (@ 65nm): • (a) has LC and VH drives Cl, power (a) > (b) • Delay (b) > (a) only if Cl > 0.5pF (~ 9mm wire) VH VL

  8. DVB Formulation • Dual Vdd Buffer Insertion (DVB) • Given interconnect tree • Find buffer placement, Vdd assignment for buffers, sizes of buffers • VH buffers driving VL buffers within the tree • Level converters at VH sinks driven by VL buffers • Minimize power subject to • Arrival time requirement at the source (RAT) • Slew rate constraint at buffer inputs and sinks

  9. DVB Algorithm • Based on [Lillis, JSSC 96] • Dynamic programming with partial solution (option) pruning • Options must now record downstream Vdd levels for buffering • To prevent VL => VH, which removes unnecessary search on solution space • Still quite slow for large nets • Challenge • Considering power causes super-linear growth in the number of options (w.r.t. tree size) • Dual Vdd buffers => 2x options at each node

  10. Speed-up Technique • Approximate by power-delay sampling • Sampling under each distinct cap value • Uniformly pick options from the entire RAT—power trade-off curve

  11. Experimental Settings for DVB • Testcase: randomly generated Steiner trees • 20 to 800 terminals in 1cm x 1cm routing area • Buffer sizes: 16x, 32x, 64x • Sampling grid set to 20x20 • Comparison • Exact power-optimal algorithm (PB)[Lillis, JSSC 96] • Our algorithm with single (SVB) and dual(DVB) Vdd buffers

  12. Sampling Preserves Optimality • Sampling has little impact on optimality • SVB follows PB closely • Still optimal delay, 1.7% larger power over PB

  13. Dual Vdd Reduces Power • Dual Vdd shifts power-delay curve to the left

  14. Experimental Results for DVB • DVB saves 23% power over SVB • More power saving in larger nets • Power saving becomes larger w/delay slack • e.g. relax delay 5%, saving becomes 26%

  15. Runtime • SVB scales a lot better for larger testcases • Achieved 17x speedup over PB [Lillis, JSSC 96] • DVB takes ~2.5x more runtime than SVB

  16. Outline • Dual-Vdd Buffer insertion and sizing (DVB) • Problem formulation • “Sampling” speed-up technique • Experimental results • Dual-Vdd buffered tree generation (D-Tree) • Problem formulation • Improved augmented orthogonal search tree • Experimental results

  17. D-Tree Formulation • Dual Vdd Buffered Tree (D-Tree) • Given locations of terminals, buffer stations and blockages • Find a rectilinear Steiner tree (RST), buffer placement/size/Vdd assignment • VH buffers driving VL buffers only • Level converters at VH sinks driven by VL buffers • Minimize power • Arrival time requirement at the source (RAT) • Slew rate constraint at buffer inputs and sinks • D-Tree is NP-Hard • Finding minimum RST alone is NP-Complete

  18. Buffered Tree Construction • Delay optimization only [Cong, DAC 00] by • Build Hanan Graph w/buffer insertion nodes according to locations of buffer stations • Path search on the grid by option propagation

  19. D-Tree Algorithm Overview • Challenges • Growth of option is exponential • An artifact of D-Tree’s NP-hardness • Considering power worsens option growth • Solution: sampling + efficient prune tree

  20. P=100 c=20, q=600 c=10, q=500 c=8, q=400 c=15, q=550 c=12, q=520 c=7, q=380 Prune Tree in [Lillis, JSSC 96] • Option inserted in sorted capacitance • Never need to clear options out from the tree • If new option is checked against the tree • Automatically avoid redundant option in tree • e.g. Фnew = (c = 20, p = 100, q = 600) • Not applicable to D-Tree problem • Order of new options is not known a priori

  21. Our Improvement on Prune Tree • Indexing w/capacitance results in fewer trees • # capacitance value < # power value • Efficient “tree cleaning” • Enables out-of-order option insertion • Guarantee no redundancy in tree

  22. Tree Cleaning • To add an option Фnew inO(|c|·log(|T|)) time • Check whether Фnew is dominated by any option in the data-structure • If not, remove options in the tree dominated by Фnew in two downward tree traversals • e.g. Фnew = (c = 10, p = 70, q = 410, …)

  23. Experimental Settings for D-Tree • Random testcases • All based on a random floorplan of 1cm x 1cm • Blockages ~ 30%, buffer stations ~1mm apart • Comparison • Delay-optimal tree (RMP) [Cong, DAC 00] • Ours with single (S-Tree) and dual(D-Tree) Vdd Buffer

  24. Experimental Results for D-Tree • Significant power saving over RMP • S-Tree: 7%, D-Tree: 18% • Larger saving for large testcases (e.g. T4) • Handles up to 6-sink nets (T5 takes 23 mins) • Similar capability compared with delay-optimal approaches [Cong, DAC 00; Chen, ASP-DAC 02]

  25. Conclusion • Formulated dual Vdd buffer insertion/tree generation without level converters • Proposed 2 speedup techniques • “Sampling” w/negligible loss of optimality • “Improved prune tree” for solution pruning • Applied to single-Vdd buffer insertion, 17x faster than existing work • Large power saving over single Vdd buffering • 23% in buffer insertion: dual Vdd vs single Vdd • 18% in buffered tree: dual Vdd vs delay optimal

  26. Future Work • Speed up tree construction • Slack allocation for more power reduction • Path-based buffer insertion[Sze, DAC 05] • Allocate slack along one interconnect path • Consider single Vdd buffers only • Chip level FPGA dual Vdd assignment[Lin, DAC 05] • Fixed buffer location, assign Vdd levels • Consider Multiple critical path • Solved as a linear programming problem

More Related