320 likes | 492 Views
ISPD’ 2014. Timing-Driven, Over-the-Block Rectilinear Steiner Tree Construction with Pre-Buffering and Slew Constraints. Yilin Zhang and David Z. Pan ECE, Univ. of Texas at Austin. Outline. Background & Motivation T OB-RSMT Problem Formulation T OB-RSMT Algorithms
E N D
ISPD’ 2014 Timing-Driven, Over-the-Block Rectilinear Steiner Tree Construction with Pre-Buffering and Slew Constraints Yilin Zhang and David Z. Pan ECE, Univ. of Texas at Austin
Outline • Background & Motivation • TOB-RSMT • Problem Formulation • TOB-RSMT Algorithms • Experimental Results • Conclusion
History of VLSI RSMTs • Wirelength driven: BOI, BI1S, RV-based RST, FLUTE and GeoSteiner • Obstacle-avoiding RSMT (OA-RSMT) • [Chow+, VLSI14] [Liu+, DAC12][Li+, ICCAD08] • Over-the-block RSMT (OB-RSMT) are proposed since 2012 • [Huang+, ICCAD12] [Zhang+, ICCAD12] • Minimum delay routing tree (MDRT) : BA-Tree, etc. • RAT-driven RSMT: C-Tree, etc.
Limitations on Previous Timing-driven RST • Cluster nodes during bottom-up method • Such as BA-Tree and C-Tree • Clustering distance metric: • spatial and slack Hard to find accurate slack: Some segments are not fixed yet All segments are not buffered yet
Limitations in Dealing Blocks • Completely neglect block will have slew problem • No over-the-block buffer allowed • Obstacle avoiding • More congested outside-block • Detour means more WL and worse timing detours
Post-buffering Topology Tuning is Necessary • Buffering plays a big role in delay reduction • Shielding effect; linear delay on long wire • But it is always placed after wiring • Change topology after buffering is fruitful! DSA decreased Db2 DSB unchanged
Our Contributions • Use pre-buffering to find practical slack for each node in the graph • Use over-the-block routing resource to improve WL, buffering cost and timing • Apply post-buffering tuning to improve timing on critical paths with little extra cost
Outline • Background & Motivation • TOB-RSMT • Problem Formulation • TOB-RSMT Algorithms • Experimental Results • Conclusion
Problem Formulation • N = {s0,s1,s2,...,sn}, n sinks and source s0 • B = {b1, b2, . . . , bm}, non-overlapping rectilinear blocks in two-dimensional space R • Buffered T(V, E) connects all the pins in N to optimize WNS with the lowest buffering cost • V is the set of nodes • E is the set of horizontal and vertical edges. • Slew rate on every point in T within constraints • Slew mode buffering [Hu+, TCAD07] • No buffers are allowed over the blocks
Timing Models • Elmore Delay • Slew • Peri Model + Bakoglu’s Metric • ( 4% error [Kashyap+, ISPD03] [Bakoglu+, 90] )
Overall Algorithm N & B Initial timing-driven RST with Pre-buffering Find all over-the-block slew violation and fix them Buffering Tune the topology according to buffering information Return buffered T Buffering
Initial Tree Generation with Pre-Buffering • Iterative method • Until converges or oscillates between several states • Feed back real delay to each node to find slack (criticality) • Identified critical sinks before topology construction are real critical ones • Practical slack on each node
Initial Tree with Pre-Buffering Flow [Lin+, TCAD11]
Initial Tree with Pre-Buffering Example Now, D is inserted far from source with less WL Simple model without buffering suggests D is critical However, with buffering, D is not critical
Buffering-Aware Over-the-Block TD-RST • TD-RST needs over-the-block route • Better WL, buffer resources and timing • Replace obstacle-avoiding detours with shorter over-the-block connection 150ps 100ps 110ps 120ps
Different with WL-driven BOB-RSMT • Move non-critical paths to save slew • Protect critical paths for timing Original WL driven WL+slack
Slew Constraints in Buffering-Aware TD-RST • The hard problem with over-the-block is slew • Each topology confines a set of inside trees • Use hypothetic buffer to check if it is possible for buffering
Optimization Primitives • Three optimization primitives Parallel sliding Perpendicular sliding EP merging [Zhang, ICCAD12]
Formulation of Buffering-Aware TD-RST • Formulation consider slack and WL together Increase of TNS Increase of WL WijCdEPit: delay increase for every sink downstream EPit
Buffer-location-based Tuning Benefits • Tuning topology after buffering benefits! • Buffering resources are costly • Improve timing without increasing buffers is tempting • With small amount of WL increase • We propose a way to post-tune the topology base on buffer location information
Saturated/Un-saturated Buffers • Some buffers are “Saturated” and some are “Un-saturated” • Saturate: the slew reaches maximum • Un-saturated: slew does not reach maximum
Buffer-location-based Tuning Study • Un-saturated buffer == opportunity WL increase Delay to A improves
Buffer-location-based Tuning Condition • Δslew = slewmax – slewcur • Lmax is the max allowed distance to relocate • If neglecting buffer input cap, Lmax = • If consider buffer input cap, Lmax =
Buffer-location-based Tuning Flow Buffered T Sort all sinks according to slack For each neg slack sink n Y n at source? Continue N n = n.parent Buffering satisfy Lmax constraint ? Return buffered T Tuning
Outline • Background & Motivation • TOB-RSMT • Problem Formulation • TOB-RSMT Algorithms • Experimental Results • Conclusion
Experimental Setups • C++ programming language • Intel Core 3.0GHz Linux machine with 32GB memory • Gurobi Optimizer 5.10 for mathematical optimization • RC01-RC12 are benchmarks [Feng+, ISPD06] • Two sizes of buffers: 450 ohms and 850 ohms, 3.8 fF and 1.9 fF • Interconnect RC from ITRS and slew constraints 70ps
Experimental Setups • SD-OARST is baseline [Lin+, TCAD11] • TOB-RST-1 OA-RST with pre-buffering • TOB-RST-2 is over-the-block with pre-buffering • TOB-RST is over-the-block with pre-buffering and post-buffering tuning
Experimental Results • TOB-RST-1 to SD-OARST • similarity of WL (buffering cost) • pre-buffering benefits the slack • TOB-RST-2 to TOB-RST-1: • 179ps on average for WNS • buffering cost and WL reduced by 6% and 5% • TOB-RST to TOB-RST-2: • 70ps in WNS on average, less than 1% more WL
Outline • Background & Motivation • TOB-RSMT • Problem Formulation • TOB-RSMT Algorithms • Experimental Results • Conclusion
Conclusion • Timing-driven over-the-block rectilinear Steiner minimum tree • Use pre-buffering to find practical slack for each node • Use over-the-block routing resources to improve WL, buffering cost and timing • Apply post-buffering tuning to improve timing on critical paths with little extra cost • Significantly improve WNS for all benchmarks along with 2% less WL and 4% less buffering cost than SD-OARST
Acknowledgment • This work is supported in part by Oracle • Thanks to Dr. SalimChowdhury, Dr. Rajendran Panda and Dr. Akshay Sharma from Oracle Thank you! Questions?