260 likes | 400 Views
Power Optimal Dual-V dd Buffered Tree Considering Buffer Stations and Blockages. King Ho Tam and Lei He Electrical Engineering Department University of California, Los Angeles Sponsors: NSF CAREER, UC MICRO (Fujitsu, Intel and Mindspeed), and IBM Faculty Partner Award. Motivation.
E N D
Power Optimal Dual-Vdd Buffered Tree Considering Buffer Stations and Blockages King Ho Tam and Lei He Electrical Engineering Department University of California, Los Angeles Sponsors: NSF CAREER, UC MICRO (Fujitsu, Intel and Mindspeed), and IBM Faculty Partner Award.
Motivation • Increasing interconnect power • 35% cells are buffers at 65nm technology [Saxena, TCAD 04] • Previous work • Power-optimal single Vdd buffer insertion[Lillis, JSSC 96] • Delay-optimal buffered tree generation[Cong, DAC 00; Alpert, TCAD 02] • No existing algorithms consider dual-Vdd for buffer insertion or buffered tree generation
Major Contributions • First in-depth study of dual Vdd buffer insertion and buffered tree generation • Large power saving over single Vdd buffering • Efficient algorithms for power optimality • 17x faster than [Lillis, JSSC 96] when single Vdd is considered
Outline • Dual Vdd buffer insertion and sizing (DVB) • Problem formulation • Sampling for speedup • Experimental results • Dual Vdd buffered tree generation (D-Tree) • Problem formulation • Improved augmented orthogonal search tree • Experimental results
Delay, Slew and Power Modeling • Elmore delay • Wire: , buffer: • Bakoglu’s slew metric (ln 9 ∙Elmore) • Power = energy per switch • Wire: • Lumped buffer dynamic/short-circuit power • Can be easily extended to leakage power • Low Vdd (VL) reduces leakage • Need to assume of clock rate and switching activity
V VH I Reduced noise margin V V VH VL I Leakage Introducing Dual Vdd Buffering • Achieves power saving since power α Vdd2 • Suffer no loss of delay optimality • VL => VH requires level converter (LC) • Restore voltage level and reduce leakage • Ext-CVS for logic [Srivastava, ISLPED 04] • LC delay and power overhead amortized
Key Observation in Dual Vdd Buffering • Disallowing VL => VH will not affect optimality • Optimality empirically illustrated (@ 65nm): • (a) has LC and VH drives Cl, power (a) > (b) • Delay (b) > (a) only if Cl > 0.5pF (~ 9mm wire) VH VL
DVB Formulation • Dual Vdd Buffer Insertion (DVB) • Given interconnect tree • Find buffer placement, Vdd assignment for buffers, sizes of buffers • VH buffers driving VL buffers within the tree • Level converters at VH sinks driven by VL buffers • Minimize power subject to • Arrival time requirement at the source (RAT) • Slew rate constraint at buffer inputs and sinks
DVB Algorithm • Based on [Lillis, JSSC 96] • Dynamic programming with partial solution (option) pruning • Options must now record downstream Vdd levels for buffering • To prevent VL => VH, which removes unnecessary search on solution space • Still quite slow for large nets • Challenge • Considering power causes super-linear growth in the number of options (w.r.t. tree size) • Dual Vdd buffers => 2x options at each node
Speed-up Technique • Approximate by power-delay sampling • Sampling under each distinct cap value • Uniformly pick options from the entire RAT—power trade-off curve
Experimental Settings for DVB • Testcase: randomly generated Steiner trees • 20 to 800 terminals in 1cm x 1cm routing area • Buffer sizes: 16x, 32x, 64x • Sampling grid set to 20x20 • Comparison • Exact power-optimal algorithm (PB)[Lillis, JSSC 96] • Our algorithm with single (SVB) and dual(DVB) Vdd buffers
Sampling Preserves Optimality • Sampling has little impact on optimality • SVB follows PB closely • Still optimal delay, 1.7% larger power over PB
Dual Vdd Reduces Power • Dual Vdd shifts power-delay curve to the left
Experimental Results for DVB • DVB saves 23% power over SVB • More power saving in larger nets • Power saving becomes larger w/delay slack • e.g. relax delay 5%, saving becomes 26%
Runtime • SVB scales a lot better for larger testcases • Achieved 17x speedup over PB [Lillis, JSSC 96] • DVB takes ~2.5x more runtime than SVB
Outline • Dual-Vdd Buffer insertion and sizing (DVB) • Problem formulation • “Sampling” speed-up technique • Experimental results • Dual-Vdd buffered tree generation (D-Tree) • Problem formulation • Improved augmented orthogonal search tree • Experimental results
D-Tree Formulation • Dual Vdd Buffered Tree (D-Tree) • Given locations of terminals, buffer stations and blockages • Find a rectilinear Steiner tree (RST), buffer placement/size/Vdd assignment • VH buffers driving VL buffers only • Level converters at VH sinks driven by VL buffers • Minimize power • Arrival time requirement at the source (RAT) • Slew rate constraint at buffer inputs and sinks • D-Tree is NP-Hard • Finding minimum RST alone is NP-Complete
Buffered Tree Construction • Delay optimization only [Cong, DAC 00] by • Build Hanan Graph w/buffer insertion nodes according to locations of buffer stations • Path search on the grid by option propagation
D-Tree Algorithm Overview • Challenges • Growth of option is exponential • An artifact of D-Tree’s NP-hardness • Considering power worsens option growth • Solution: sampling + efficient prune tree
P=100 c=20, q=600 c=10, q=500 c=8, q=400 c=15, q=550 c=12, q=520 c=7, q=380 Prune Tree in [Lillis, JSSC 96] • Option inserted in sorted capacitance • Never need to clear options out from the tree • If new option is checked against the tree • Automatically avoid redundant option in tree • e.g. Фnew = (c = 20, p = 100, q = 600) • Not applicable to D-Tree problem • Order of new options is not known a priori
Our Improvement on Prune Tree • Indexing w/capacitance results in fewer trees • # capacitance value < # power value • Efficient “tree cleaning” • Enables out-of-order option insertion • Guarantee no redundancy in tree
Tree Cleaning • To add an option Фnew inO(|c|·log(|T|)) time • Check whether Фnew is dominated by any option in the data-structure • If not, remove options in the tree dominated by Фnew in two downward tree traversals • e.g. Фnew = (c = 10, p = 70, q = 410, …)
Experimental Settings for D-Tree • Random testcases • All based on a random floorplan of 1cm x 1cm • Blockages ~ 30%, buffer stations ~1mm apart • Comparison • Delay-optimal tree (RMP) [Cong, DAC 00] • Ours with single (S-Tree) and dual(D-Tree) Vdd Buffer
Experimental Results for D-Tree • Significant power saving over RMP • S-Tree: 7%, D-Tree: 18% • Larger saving for large testcases (e.g. T4) • Handles up to 6-sink nets (T5 takes 23 mins) • Similar capability compared with delay-optimal approaches [Cong, DAC 00; Chen, ASP-DAC 02]
Conclusion • Formulated dual Vdd buffer insertion/tree generation without level converters • Proposed 2 speedup techniques • “Sampling” w/negligible loss of optimality • “Improved prune tree” for solution pruning • Applied to single-Vdd buffer insertion, 17x faster than existing work • Large power saving over single Vdd buffering • 23% in buffer insertion: dual Vdd vs single Vdd • 18% in buffered tree: dual Vdd vs delay optimal
Future Work • Speed up tree construction • Slack allocation for more power reduction • Path-based buffer insertion[Sze, DAC 05] • Allocate slack along one interconnect path • Consider single Vdd buffers only • Chip level FPGA dual Vdd assignment[Lin, DAC 05] • Fixed buffer location, assign Vdd levels • Consider Multiple critical path • Solved as a linear programming problem