Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages

Power Optimal Dual-Vdd Buffered Tree Considering Buffer Stations and Blockages King Ho Tam and Lei He Electrical Engineering Department University of California, Los Angeles Sponsors: NSF CAREER, UC MICRO (Fujitsu, Intel and Mindspeed), and IBM Faculty Partner Award.

Motivation • Increasing interconnect power • 35% cells are buffers at 65nm technology [Saxena, TCAD 04] • Previous work • Power-optimal single Vdd buffer insertion[Lillis, JSSC 96] • Delay-optimal buffered tree generation[Cong, DAC 00; Alpert, TCAD 02] • No existing algorithms consider dual-Vdd for buffer insertion or buffered tree generation

Major Contributions • First in-depth study of dual Vdd buffer insertion and buffered tree generation • Large power saving over single Vdd buffering • Efficient algorithms for power optimality • 17x faster than [Lillis, JSSC 96] when single Vdd is considered

Outline • Dual Vdd buffer insertion and sizing (DVB) • Problem formulation • Sampling for speedup • Experimental results • Dual Vdd buffered tree generation (D-Tree) • Problem formulation • Improved augmented orthogonal search tree • Experimental results

Delay, Slew and Power Modeling • Elmore delay • Wire: , buffer: • Bakoglu’s slew metric (ln 9 ∙Elmore) • Power = energy per switch • Wire: • Lumped buffer dynamic/short-circuit power • Can be easily extended to leakage power • Low Vdd (VL) reduces leakage • Need to assume of clock rate and switching activity

V VH I Reduced noise margin V V VH VL I Leakage Introducing Dual Vdd Buffering • Achieves power saving since power α Vdd2 • Suffer no loss of delay optimality • VL => VH requires level converter (LC) • Restore voltage level and reduce leakage • Ext-CVS for logic [Srivastava, ISLPED 04] • LC delay and power overhead amortized

Key Observation in Dual Vdd Buffering • Disallowing VL => VH will not affect optimality • Optimality empirically illustrated (@ 65nm): • (a) has LC and VH drives Cl, power (a) > (b) • Delay (b) > (a) only if Cl > 0.5pF (~ 9mm wire) VH VL

DVB Formulation • Dual Vdd Buffer Insertion (DVB) • Given interconnect tree • Find buffer placement, Vdd assignment for buffers, sizes of buffers • VH buffers driving VL buffers within the tree • Level converters at VH sinks driven by VL buffers • Minimize power subject to • Arrival time requirement at the source (RAT) • Slew rate constraint at buffer inputs and sinks

DVB Algorithm • Based on [Lillis, JSSC 96] • Dynamic programming with partial solution (option) pruning • Options must now record downstream Vdd levels for buffering • To prevent VL => VH, which removes unnecessary search on solution space • Still quite slow for large nets • Challenge • Considering power causes super-linear growth in the number of options (w.r.t. tree size) • Dual Vdd buffers => 2x options at each node

Speed-up Technique • Approximate by power-delay sampling • Sampling under each distinct cap value • Uniformly pick options from the entire RAT—power trade-off curve

Experimental Settings for DVB • Testcase: randomly generated Steiner trees • 20 to 800 terminals in 1cm x 1cm routing area • Buffer sizes: 16x, 32x, 64x • Sampling grid set to 20x20 • Comparison • Exact power-optimal algorithm (PB)[Lillis, JSSC 96] • Our algorithm with single (SVB) and dual(DVB) Vdd buffers

Sampling Preserves Optimality • Sampling has little impact on optimality • SVB follows PB closely • Still optimal delay, 1.7% larger power over PB

Dual Vdd Reduces Power • Dual Vdd shifts power-delay curve to the left

Experimental Results for DVB • DVB saves 23% power over SVB • More power saving in larger nets • Power saving becomes larger w/delay slack • e.g. relax delay 5%, saving becomes 26%

Runtime • SVB scales a lot better for larger testcases • Achieved 17x speedup over PB [Lillis, JSSC 96] • DVB takes ~2.5x more runtime than SVB

Outline • Dual-Vdd Buffer insertion and sizing (DVB) • Problem formulation • “Sampling” speed-up technique • Experimental results • Dual-Vdd buffered tree generation (D-Tree) • Problem formulation • Improved augmented orthogonal search tree • Experimental results

D-Tree Formulation • Dual Vdd Buffered Tree (D-Tree) • Given locations of terminals, buffer stations and blockages • Find a rectilinear Steiner tree (RST), buffer placement/size/Vdd assignment • VH buffers driving VL buffers only • Level converters at VH sinks driven by VL buffers • Minimize power • Arrival time requirement at the source (RAT) • Slew rate constraint at buffer inputs and sinks • D-Tree is NP-Hard • Finding minimum RST alone is NP-Complete

Buffered Tree Construction • Delay optimization only [Cong, DAC 00] by • Build Hanan Graph w/buffer insertion nodes according to locations of buffer stations • Path search on the grid by option propagation

D-Tree Algorithm Overview • Challenges • Growth of option is exponential • An artifact of D-Tree’s NP-hardness • Considering power worsens option growth • Solution: sampling + efficient prune tree

P=100 c=20, q=600 c=10, q=500 c=8, q=400 c=15, q=550 c=12, q=520 c=7, q=380 Prune Tree in [Lillis, JSSC 96] • Option inserted in sorted capacitance • Never need to clear options out from the tree • If new option is checked against the tree • Automatically avoid redundant option in tree • e.g. Фnew = (c = 20, p = 100, q = 600) • Not applicable to D-Tree problem • Order of new options is not known a priori

Our Improvement on Prune Tree • Indexing w/capacitance results in fewer trees • # capacitance value < # power value • Efficient “tree cleaning” • Enables out-of-order option insertion • Guarantee no redundancy in tree

Tree Cleaning • To add an option Фnew inO(|c|·log(|T|)) time • Check whether Фnew is dominated by any option in the data-structure • If not, remove options in the tree dominated by Фnew in two downward tree traversals • e.g. Фnew = (c = 10, p = 70, q = 410, …)

Experimental Settings for D-Tree • Random testcases • All based on a random floorplan of 1cm x 1cm • Blockages ~ 30%, buffer stations ~1mm apart • Comparison • Delay-optimal tree (RMP) [Cong, DAC 00] • Ours with single (S-Tree) and dual(D-Tree) Vdd Buffer

Experimental Results for D-Tree • Significant power saving over RMP • S-Tree: 7%, D-Tree: 18% • Larger saving for large testcases (e.g. T4) • Handles up to 6-sink nets (T5 takes 23 mins) • Similar capability compared with delay-optimal approaches [Cong, DAC 00; Chen, ASP-DAC 02]

Conclusion • Formulated dual Vdd buffer insertion/tree generation without level converters • Proposed 2 speedup techniques • “Sampling” w/negligible loss of optimality • “Improved prune tree” for solution pruning • Applied to single-Vdd buffer insertion, 17x faster than existing work • Large power saving over single Vdd buffering • 23% in buffer insertion: dual Vdd vs single Vdd • 18% in buffered tree: dual Vdd vs delay optimal

Future Work • Speed up tree construction • Slack allocation for more power reduction • Path-based buffer insertion[Sze, DAC 05] • Allocate slack along one interconnect path • Consider single Vdd buffers only • Chip level FPGA dual Vdd assignment[Lin, DAC 05] • Fixed buffer location, assign Vdd levels • Consider Multiple critical path • Solved as a linear programming problem

Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages

Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages

Presentation Transcript

Power Stations

Power Stations

Power Stations

Clock Buffer Polarity Assignment Considering Capacitive Load

Barriers and Blockages

Power Stations

Power stations

Power stations

Power Stations

The Buffer Tree

Buffered Routing Tree Construction Under Buffer Placement Blockage

Barriers and Blockages

Power stations

Riparian Buffer Tree Planting Workshop

Optimal Binary Search Tree

Dual-V DD Pass Transistor Logic

Fast Buffer Insertion Considering Process Variation

Power Stations

V DD

Porosity Aware Buffered Steiner Tree Construction

Paper Title Optimal Power Flow Considering Wheeling Charges By

Optimal Binary Search Tree